Talk:Binary GCD algorithm
From Wikipedia, the free encyclopedia
Contents |
[edit] unsigned integers
If don't see, why the algorithm is restricted to unsigned integers. As far as I can see it works equally on negative numbers. Even more the algorithm would be simplified because no v>=u comparison is necessary. 134.102.210.237 11:54, 31 March 2006 (UTC)
[edit] ARM implementation
The ARM implementation lacks the initial u = 0 test. The code also does not appear to make good use of the conditional instruction set, so I have some optimizations to suggest:
[edit] remove_twos_loop optimization
gcd:
@ Arguments arrive in registers r0 and r1
mov r3, #0 @ Initialize r3, the power of 2 in the result
orr r12, r0, r1 @ r12 = (r0 | r1)
tst r12, #1 @ Test least significant bit of (r0 | r1)
bne gcd_loop @ If nonzero, either r0 or r1 are odd, jump into middle of next loop
remove_twos_loop:
mov r0, r0, lsr #1 @ Shift r0 right, dividing it by 2
mov r1, r1, lsr #1 @ Shift r1 right, dividing it by 2
add r3, r3, #1 @ Increment r3
tst r0, #1 @ Check least significant bit of r0
bne gcd_loop @ If nonzero, r0 is odd, jump into middle of next loop
tst r1, #1 @ Check least significant bit of r1
beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
[edit] optimization 1
gcd:
@ Arguments arrive in registers r0 and r1
mov r3, #0 @ Initialize r3, the power of 2 in the result
orr r12, r0, r1 @ r12 = (r0 | r1)
remove_twos_loop:
tst r12, #1 @ Test least significant bit of (r0 | r1)
moveq r0, r0, lsr #1 @ Shift r0 right, dividing it by 2
moveq r1, r1, lsr #1 @ Shift r1 right, dividing it by 2
moveq r12, r12, lsr #1 @ Shift r12 right, dividing it by 2
beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
[edit] optimization 2
gcd:
@ Arguments arrive in registers r0 and r1
mov r3, #0 @ Initialize r3, the power of 2 in the result
remove_twos_loop:
tst r0, #1 @ Test least significant bit of r0
tsteq r1, #1 @ Test least significant bit of r1
moveq r0, r0, lsr #1 @ Shift r0 right, dividing it by 2
moveq r1, r1, lsr #1 @ Shift r1 right, dividing it by 2
beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
[edit] gcd_loop optimization
gcd_loop: @ Loop finding gcd of r0, r1 not both even
tst r0, #1 @ Check least significant bit of r0
moveq r0, r0, lsr #1 @ If r0 is even, shift r0 right, dividing it by 2, and...
beq gcd_loop_continue @ ... continue to next iteration of loop.
tst r1, #1 @ Check least significant bit of r1
moveq r1, r1, lsr #1 @ If r1 is even, shift it right by 1 and...
beq gcd_loop_continue @ ... continue to next iteration of loop.
cmp r0, r1 @ Compare r0 to r1
subcs r2, r0, r1 @ If r0 >= r1, take r0 minus r1, and...
movcs r0, r2, lsr #1 @ ... shift it right and put it in r0.
subcc r2, r1, r0 @ If r0 < r1, take r1 minus r0, and...
movcc r1, r2, lsr #1 @ ... shift it right and put it in r1.
gcd_loop_continue:
cmp r0, #0 @ Has r0 dropped to zero?
bne gcd_loop @ If not, iterate
mov r0, r1, asl r3 @ Put r1 << r3 in the result register
bx lr @ Return to caller
[edit] optimization
gcd_loop: @ Loop finding gcd of r0, r1 not both even
tst r0, #1 @ Check least significant bit of r0
moveq r0, r0, lsr #1 @ If r0 is even, shift r0 right, dividing it by 2, and...
beq gcd_loop @ ... continue to next iteration of loop.
gcd_loop_2: @ Loop finding gcd of r0, r1
tst r1, #1 @ Check least significant bit of r1
moveq r1, r1, lsr #1 @ If r1 is even, shift it right by 1 and...
beq gcd_loop_2 @ ... continue to next iteration of loop.
cmp r0, r1 @ Compare r0 to r1
subcc r2, r1, r0 @ If r0 < r1, take r1 minus r0, and...
movcc r1, r2, lsr #1 @ ... shift it right and put it in r1.
bcc gcd_loop_2 @ ... continue to next iteration of loop (r0 is still odd).
sub r2, r0, r1 @ If r0 >= r1, take r0 minus r1, and...
movs r0, r2, lsr #1 @ ... shift it right and put it in r0.
bne gcd_loop @ Has r0 dropped to zero? If not, iterate
mov r0, r1, asl r3 @ Put r1 << r3 in the result register
bx lr @ Return to caller
- Sounds good to me. The point of this code sample is just to illustrate how efficient binary GCD is on a real machine. Feel free to plug in the most optimized version you can imagine. Deco 19:50, 31 October 2005 (UTC)
[edit] Implementations
I've removed the ML implementation, because it teaches more about ML than about the algorithm. I have my doubts about the value of the assembly version, but I can't really assess it as I can't read it. IMHO, the C implementation is important because it exemplifies the use of bitwise operations for efficiency. Qwertyus 22:31, 13 April 2006 (UTC)
- I think you did the right thing. I just restored the C and ARM implementations after User:Arvindn blanked them, for these reasons: IMHO, C isn't important for showing bitwise operations; it should be obvious that you'd use bitwise ops where appropriate. But the C implementation is easier to read than the English algorithm at the moment, so it needs to stay at least until there's an appropriate substitute (pseudocode? Knuth-style algorithm description?). IMHO, the ARM implementation is really enlightening, because it shows the real size and speed benefit of the algorithm in a way that no higher-level language's compiler even approaches. (In particular, it totally stomps the C implementation.) Without some hand-optimized assembly implementation, the advantage of binary GCD over Euclid is not obvious, IMHO. --Quuxplusone 21:18, 1 January 2007 (UTC)

