Montgomery reduction

From Wikipedia, the free encyclopedia

Montgomery reduction is an algorithm introduced in 1985 by Peter Montgomery that allows modular arithmetic to be performed efficiently when the modulus is large (typically several hundred bits).

A single application of the Montgomery algorithm (henceforth referred to as a "Montgomery step") is faster than a "naive" modular multiplication:

c = a \times b \pmod {n}

Because numbers have to be converted to and from a particular form suitable for performing the Montgomery step, a single modular multiplication performed using a Montgomery step is actually slightly less efficient than a "naive" one. However, modular exponentiation can be implemented as a sequence of Montgomery steps, with conversion only required once at the start and once at the end of the sequence. In this case the greater speed of the Montgomery steps far outweighs the need for the extra conversions.

Contents

[edit] The Montgomery step

Working with n-digit numbers to base d, a Montgomery step calculates a \times b \div d^n\pmod r. The base d is typically 2 for microelectronic applications or 232 or 264 for software applications. For the purpose of exposition, we shall illustrate with d=10 and n=4.

To calculate 5678 × a ÷ 10000:

  1. Zero the accumulator.
  2. Add 8a to the accumulator.
  3. Shift the accumulator one place to the right (thus dividing by 10).
  4. Add 7a to the accumulator.
  5. Shift the accumulator one place to the right.
  6. Add 6a to the accumulator.
  7. Shift the accumulator one place to the right.
  8. Add 5a to the accumulator.
  9. Shift the accumulator one place to the right.

It is easy to see that the result is 0.5678 × a, as required.

To turn this into a modular operation with a modulus r, add, immediately before each shift, whatever multiple of r is needed to make the value in the accumulator a multiple of 10.

The result will be that the final value in the accumulator will be an integer (since only multiples of 10 have every been divided by 10) and equivalent (modulo r) to 5678 × a ÷ 10000.

Finding the appropriate multiple of r is a simple operation of single-digit arithmetic. When working to base 2, it is trivial to calculate: if the value in the accumulator is even, the multiple is 0 (nothing needs to be added); if the value in the accumulator is odd, the multiple is 1 (r needs to be added).

The Montgomery step is faster than the methods of "naive" modular arithmetic because the decision as to what multiple of r to add is taken purely on the basis of the least significant digit of the accumulator. This allows the use of carry-save adders, which are much faster than the conventional kind but are not immediately able to give accurate values for the more significant digits of the result.

[edit] Modular multiplication

On its own a Montgomery step is a freak and no-one in his right mind would want to use it. However, consider the following pair of calculations:

24×73 = 1752
240000× 730000 ÷ 10000 = 17520000

It can be seen that if we choose to represent integers by 10000 times themselves (let us temporarily call this a "Montgomery representation") then the result of a Montgomery step on the Montgomery representation of a and the Montgomery representation of b is the Montgomery representation of a × b.

Thus we can use a Montgomery step to perform a modular multiplication by "Montgomeryizing" both operands before the Montgomery step and "de-Montgomeryizing" the result after it.

To "de-Montgomeryize" a number - in other words, to take it from its representation as "12340000" to a conventional representation as "1234" - it suffices to do a single Montgomery step with the number and 1: 12340000×1÷10000=1234.

To "Montgomeryize" a number - in other words, to take it from its conventional representation to a representation as "12340000" - it suffices to do a single Montgomery step with the number and 100000000: 1234×100000000÷10000=12340000.

The value of 100000000 modulo r can be precomputed, since the same modulus r is usually used many times over.

The total budget for a single modular multiplication is thus four Montgomery steps: two to "Montgomeryize" the operands, one to perform the actual multiplication, and one to "de-Montgomeryize" the result.

A Montgomery step is unlikely ever to be four times faster than a conventional modular multiplication (because a carry-save addition is unlikely ever to be four times faster than a conventional addition) and so Montgomery's algorithm is not efficient for single multiplications.

[edit] Modular exponentiation

Raising a number to a k-bit exponent involves between k and 2k multiplications. In most applications of modular exponentiation the exponent is at least several hundred bits long.

To fix our ideas, suppose that a particular modular exponentiation requires 800 multiplications. In that case 802 Montgomery steps will be needed: one to Montgomeryize the number being exponentiated, 800 to do the exponentiation, and one to de-Montgomeryize the result.

If a Montgomery step is even slightly faster than a conventional modular multiplication, the Montgomery algorithm will produce a faster result than conventional modular exponentiation.


[edit] References

Languages