Talk:Binary GCD algorithm

From Wikipedia, the free encyclopedia

1 unsigned integers
2 ARM implementation
3 remove_twos_loop optimization
- 3.1 optimization 1
- 3.2 optimization 2
4 gcd_loop optimization
- 4.1 optimization
5 Implementations

[edit] unsigned integers

If don't see, why the algorithm is restricted to unsigned integers. As far as I can see it works equally on negative numbers. Even more the algorithm would be simplified because no v>=u comparison is necessary. 134.102.210.237 11:54, 31 March 2006 (UTC)

[edit] ARM implementation

The ARM implementation lacks the initial u = 0 test. The code also does not appear to make good use of the conditional instruction set, so I have some optimizations to suggest:

[edit] remove_twos_loop optimization

gcd:
    @ Arguments arrive in registers r0 and r1
    mov     r3, #0            @ Initialize r3, the power of 2 in the result
    orr     r12, r0, r1       @ r12 = (r0 | r1)
    tst     r12, #1           @ Test least significant bit of (r0 | r1)
    bne     gcd_loop          @ If nonzero, either r0 or r1 are odd, jump into middle of next loop
remove_twos_loop:
    mov     r0, r0, lsr #1    @ Shift r0 right, dividing it by 2
    mov     r1, r1, lsr #1    @ Shift r1 right, dividing it by 2
    add     r3, r3, #1        @ Increment r3
    tst     r0, #1            @ Check least significant bit of r0
    bne     gcd_loop          @ If nonzero, r0 is odd, jump into middle of next loop
    tst     r1, #1            @ Check least significant bit of r1
    beq     remove_twos_loop  @ If zero, r0 and r1 still even, keep looping

[edit] optimization 1

gcd:
    @ Arguments arrive in registers r0 and r1
    mov     r3, #0            @ Initialize r3, the power of 2 in the result
    orr     r12, r0, r1       @ r12 = (r0 | r1)
remove_twos_loop:
    tst     r12, #1           @ Test least significant bit of (r0 | r1)
    moveq   r0, r0, lsr #1    @ Shift r0 right, dividing it by 2
    moveq   r1, r1, lsr #1    @ Shift r1 right, dividing it by 2
    moveq   r12, r12, lsr #1  @ Shift r12 right, dividing it by 2
    beq     remove_twos_loop  @ If zero, r0 and r1 still even, keep looping

[edit] optimization 2

gcd:
    @ Arguments arrive in registers r0 and r1
    mov     r3, #0            @ Initialize r3, the power of 2 in the result
remove_twos_loop:
    tst     r0, #1            @ Test least significant bit of r0
    tsteq   r1, #1            @ Test least significant bit of r1
    moveq   r0, r0, lsr #1    @ Shift r0 right, dividing it by 2
    moveq   r1, r1, lsr #1    @ Shift r1 right, dividing it by 2
    beq     remove_twos_loop  @ If zero, r0 and r1 still even, keep looping

[edit] gcd_loop optimization

gcd_loop:                     @ Loop finding gcd of r0, r1 not both even
    tst     r0, #1            @ Check least significant bit of r0
    moveq   r0, r0, lsr #1    @ If r0 is even, shift r0 right, dividing it by 2, and...
    beq     gcd_loop_continue @ ... continue to next iteration of loop.
    tst     r1, #1            @ Check least significant bit of r1
    moveq   r1, r1, lsr #1    @ If r1 is even, shift it right by 1 and...
    beq     gcd_loop_continue @ ... continue to next iteration of loop.
    cmp     r0, r1            @ Compare r0 to r1
    subcs   r2, r0, r1        @ If r0 >= r1, take r0 minus r1, and...
    movcs   r0, r2, lsr #1    @ ... shift it right and put it in r0.
    subcc   r2, r1, r0        @ If r0 < r1, take r1 minus r0, and...
    movcc   r1, r2, lsr #1    @ ... shift it right and put it in r1.
gcd_loop_continue:
    cmp     r0, #0            @ Has r0 dropped to zero?
    bne     gcd_loop          @ If not, iterate
    mov     r0, r1, asl r3    @ Put r1 << r3 in the result register
    bx      lr                @ Return to caller

[edit] optimization

gcd_loop:                     @ Loop finding gcd of r0, r1 not both even
    tst     r0, #1            @ Check least significant bit of r0
    moveq   r0, r0, lsr #1    @ If r0 is even, shift r0 right, dividing it by 2, and...
    beq     gcd_loop          @ ... continue to next iteration of loop.
gcd_loop_2:                   @ Loop finding gcd of r0, r1
    tst     r1, #1            @ Check least significant bit of r1
    moveq   r1, r1, lsr #1    @ If r1 is even, shift it right by 1 and...
    beq     gcd_loop_2        @ ... continue to next iteration of loop.
    cmp     r0, r1            @ Compare r0 to r1
    subcc   r2, r1, r0        @ If r0 < r1, take r1 minus r0, and...
    movcc   r1, r2, lsr #1    @ ... shift it right and put it in r1.
    bcc     gcd_loop_2        @ ... continue to next iteration of loop (r0 is still odd).
    sub     r2, r0, r1        @ If r0 >= r1, take r0 minus r1, and...
    movs    r0, r2, lsr #1    @ ... shift it right and put it in r0.
    bne     gcd_loop          @ Has r0 dropped to zero? If not, iterate
    mov     r0, r1, asl r3    @ Put r1 << r3 in the result register
    bx      lr                @ Return to caller

Sounds good to me. The point of this code sample is just to illustrate how efficient binary GCD is on a real machine. Feel free to plug in the most optimized version you can imagine. Deco 19:50, 31 October 2005 (UTC)

[edit] Implementations

I've removed the ML implementation, because it teaches more about ML than about the algorithm. I have my doubts about the value of the assembly version, but I can't really assess it as I can't read it. IMHO, the C implementation is important because it exemplifies the use of bitwise operations for efficiency. Qwertyus 22:31, 13 April 2006 (UTC)

I think you did the right thing. I just restored the C and ARM implementations after User:Arvindn blanked them, for these reasons: IMHO, C isn't important for showing bitwise operations; it should be obvious that you'd use bitwise ops where appropriate. But the C implementation is easier to read than the English algorithm at the moment, so it needs to stay at least until there's an appropriate substitute (pseudocode? Knuth-style algorithm description?). IMHO, the ARM implementation is really enlightening, because it shows the real size and speed benefit of the algorithm in a way that no higher-level language's compiler even approaches. (In particular, it totally stomps the C implementation.) Without some hand-optimized assembly implementation, the advantage of binary GCD over Euclid is not obvious, IMHO. --Quuxplusone 21:18, 1 January 2007 (UTC)

Talk:Binary GCD algorithm

From Wikipedia, the free encyclopedia

Contents

[edit] unsigned integers

[edit] ARM implementation

[edit] remove_twos_loop optimization

[edit] optimization 1

[edit] optimization 2

[edit] gcd_loop optimization

[edit] optimization

[edit] Implementations

Views

Navigation

Interaction

Search