www.delorie.com/gnu/docs/gmp/gmp_86.html   search  
 
Buy GNU books!


GNU MP 4.1.2

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

16.1.2 Karatsuba Multiplication

The Karatsuba multiplication algorithm is described in Knuth section 4.3.3 part A, and various other textbooks. A brief description is given here.

The inputs x and y are treated as each split into two parts of equal length (or the most significant part one limb shorter if N is odd).

 
 high              low
+----------+----------+
|    x1    |    x0    |
+----------+----------+

+----------+----------+
|    y1    |    y0    |
+----------+----------+

Let b be the power of 2 where the split occurs, ie. if x0 is k limbs (y0 the same) then \N\}, b=2^(k*mp_bits_per_limb)}. With that x=x1*b+x0 and y=y1*b+y0, and the following holds,

 
\N\{xy = (b^2+b)x_1y_1 - b(x_1-x_0)(y_1-y_0) + (b+1)x_0y_0,
  x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0}

This formula means doing only three multiplies of (N/2)x(N/2) limbs, whereas a basecase multiply of NxN limbs is equivalent to four multiplies of (N/2)x(N/2). The factors (b^2+b) etc represent the positions where the three products must be added.

 
 high                              low
+--------+--------+ +--------+--------+
|      x1*y1      | |      x0*y0      |
+--------+--------+ +--------+--------+
          +--------+--------+
      add |      x1*y1      |
          +--------+--------+
          +--------+--------+
      add |      x0*y0      |
          +--------+--------+
          +--------+--------+
      sub | (x1-x0)*(y1-y0) |
          +--------+--------+

The term (x1-x0)*(y1-y0) is best calculated as an absolute value, and the sign used to choose to add or subtract. Notice the sum \N\(x_0y_0)+\mathop{\rm low}(x_1y_1), high(x0*y0)+low(x1*y1)} occurs twice, so it's possible to do 5*k limb additions, rather than 6*k, but in GMP extra function call overheads outweigh the saving.

Squaring is similar to multiplying, but with x=y the formula reduces to an equivalent with three squares,

 
\N\{x^2 = (b^2+b)x_1^2 - b(x_1-x_0)^2 + (b+1)x_0^2,
   x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2}

The final result is accumulated from those three squares the same way as for the three multiplies above. The middle term (x1-x0)^2 is now always positive.

A similar formula for both multiplying and squaring can be constructed with a middle term (x1+x0)*(y1+y0). But those sums can exceed k limbs, leading to more carry handling and additions than the form above.

Karatsuba multiplication is asymptotically an O(N^1.585) algorithm, the exponent being log(3)/log(2), representing 3 multiplies each 1/2 the size of the inputs. This is a big improvement over the basecase multiply at O(N^2) and the advantage soon overcomes the extra additions Karatsuba performs.

MUL_KARATSUBA_THRESHOLD can be as little as 10 limbs. The SQR threshold is usually about twice the MUL. The basecase algorithm will take a time of the form M(N) = a*N^2 + b*N + c and the Karatsuba algorithm \N\{K(N) = 3M(N/2) + dN + e, K(N) = 3*M(N/2) + d*N + e}. Clearly per-crossproduct speedups in the basecase code reduce a and decrease the threshold, but linear style speedups reducing b will actually increase the threshold. The latter can be seen for instance when adding an optimized mpn_sqr_diagonal to mpn_sqr_basecase. Of course all speedups reduce total time, and in that sense the algorithm thresholds are merely of academic interest.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright © 2003   by The Free Software Foundation     Updated Jun 2003  

Please take a moment to fill out this visitor survey
You can help support this site by visiting the advertisers that sponsor it! (only once each, though)