primes.doc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
  1. The choice of Diffie-Hellman parameters
  2. * Background
  3. Diffie-Hellman key exchange uses two parameters, a prime p and a
  4. generator g, which are used to derive the public parameters
  5. y1 = g^x1 (mod p) and y2 = g^x2 (mod p), and then the shared secret
  6. z = y1^x2 = (g^x1)^x2 = g^(x1*x2) = (g^x2)^x1 = y2^x1 (mod p).
  7. For the computation to be secure, several conditions must be true.
  8. The exponent must be big enough, for there is a square-root search
  9. algorithm to find the exponent. (E.g. a 16-bit exponent can be found
  10. in about 2^8 = 256 steps.) And then the modulus must be chosen so
  11. as to make the general discrete log problem difficult.
  12. The general discrete log problem can be solved for each prime-power
  13. factor of p-1 independently, so if all of the factors of p-1 are small,
  14. this is easy to do. Since p-1 is even, it must have a factor of 2, but
  15. the remaining portion q = (p-1)/2 can be chosen to be prime, making the
  16. problem as difficult as possible. Finding such numbers is computationally
  17. expensive, but as they are parameters which are only computed once, this
  18. is a reasonable up-front cost.
  19. * Number theory
  20. A second advantage of prime moduli of this form is that all generators
  21. g are good. This is because the generator must have a large order in
  22. the group Z*_p. But that group is of size p-1 = 2*q, and the order of
  23. any element of a group must divide the size of the group. The only
  24. divisors this has are 1, 2, q and 2*q. The only element of order 1 is
  25. 1, and the only element of order 2 is -1. All other elements, from 2
  26. through -2, have orders of either p-1 or (p-1)/2, which are both large.
  27. If the generator g has order p-1, it is a generator of the group Z*_p,
  28. and this is generally how one is advised to generate Diffie-Hellman
  29. parameters. This explains the similarity in names. However, if g is
  30. of order p-1, then it must be a quadratic non-residue modulo p. That
  31. is, it must not be a square of another number. If it were a square,
  32. then since the size of the group is even, no power of it would ever
  33. equal its square root, so it could not be a generator.
  34. If g is indeed of order p-1, then even powers of g are quadratic
  35. residues (squares, modulo p), and odd powers are quadratic non-residues
  36. (non-squares). Given a number y and a prime p, the Legendre symbol
  37. (y/p) is straightforward to compute, and this tells you if y is a
  38. quadratic residue. If it is, and y = g^x, then x must be even. If not,
  39. then x would have to be odd. In this way, for a generator which is a
  40. quadratic non-residue, the low-order bit of the exponent x is easily
  41. computed.
  42. If g is a quadratic residue, then the useful values of exponents x is
  43. more limited, since only the value of the exponent x modulo q = (p-1)/2
  44. has any effect on the output y = g^x (mod p), but generally exponents x
  45. are much less than p in any case, so this limitation on range is not an
  46. issue.
  47. Essentially, in either case, only the value of x modulo q is secret,
  48. but if g is a quadratic non-residue, the low-order bit of x is
  49. available to an attacker, while if it is a quadratic residue, the
  50. high-order bit is known to be 0.
  51. Thus, it does not really matter whether g is a quadratic residue or
  52. not, but if it is not, the exponent x should be chosen one bit larger.
  53. This adds a trivial amount of work to the computation of y, and for
  54. that reason it may be preferable to choose g to be a quadratic
  55. residue. This is not currently done, however.
  56. * Choice of generator g
  57. Because any g will do, and the choice of g does not affect the difficulty
  58. of performing the discrete log computation, choosing it for convenience
  59. of computation is best, and g = 2 is simplest to compute with.
  60. If in fact it is desirable to choose a generator which is a quadratic
  61. residue, then g = 2 can still be used if the prime p is suitably
  62. chosen. If p = +/-1 (mod 8), then 2 is a quadratic residue. If
  63. p = +/-3 (mod 8), then 2 is a non-residue.
  64. * Choice of prime-generation technique
  65. There may be additional primes of special form for which the discrete
  66. logarithm problem is particularly easy. The authors are not aware of
  67. any, but theoretical advances are possible and it would be nice to
  68. assure users of the system that the prime was not chosen to have any
  69. hidden special properties: only the published criteria were used.
  70. David Kravitz of the NSA has suggested a technique for generating
  71. "kosherized" primes for DSS which has been adapted to generate
  72. Diffie-Hellman primes.
  73. The technique uses a string of bytes as a seed to a cryptographically
  74. strong one-way hash function. This generator produces the initial
  75. value for a search for a suitable prime.
  76. David Kravitz' technique generates random numbers from successive seeds
  77. until one is found to be a suitable prime. This is unbearably slow for
  78. primes of the special form being sought, but it can be sped up, at a
  79. negligible cost in uniformity of the chosen primes by generating only a
  80. starting position for a linear search for a suitable prime. Such a
  81. search can be carried out particularly efficiently.
  82. * Details of the technique
  83. The generator is based on SHA.1, the FIPS 180.1 secure hash algorithm.
  84. This takes the given seed as input and produces a 160-bit output
  85. sequence in 20 bytes. These bytes are taken as a big-endian number to
  86. produce a number n0 from 0 to 2^160-1.
  87. (I.e. n0 = 2^152 * byte0 + 2^144 * byte1 + ... + 2^8 * byte19 + byte20.)
  88. Then, the seed is incremented, as a big-endian array of bytes, modulo its
  89. size (i.e. the last byte is incremented, propagating carry if necessary),
  90. and hashed again to produce n1, then n2, etc.
  91. A number of arbitrary size may be constructed by concatenating
  92. N = n0 + 2^160 * n1 + 2^320 * n2 + .... To get a number no larger
  93. than 2^k, take the low-order k bits of N, N mod 2^k. Obviously,
  94. if k is 1024, it is only necessary to compute n0 through n6.
  95. To generate a k-bit prime p (2^k > p >= 2^(k-1)), take t = N mod 2^(k-2),
  96. i.e. a number with at most k-2 significant bits. Then add 2^(k-1),
  97. to force the number into the desired range, and 2^(k-2), to force it
  98. into the high half of the range. This extra refinement makes an attack
  99. more expensive, without affecting the time required to do computations
  100. mod p. Additional high-order 1 bits could be forced, but the incremental
  101. benefit rapidly diminishes.
  102. The resultant number t is used as the starting point in a search for a
  103. suitable prime p. p is chosen to be the first number >= t such that p
  104. is prime and (p-1)/2 is prime.
  105. * Choice of seed
  106. Because SHA.1 is a cryptographic hash, it is computationally infeasible
  107. to find an input which has a given output. Indeed, there is no known
  108. technique better than brute-force search to find an input which
  109. produces an output with any special properties. Assuming that there is
  110. an unknown class of primes which are easy to solve the discrete
  111. logarithm problem for, this ensures that the chance of choosing a prime
  112. p which is a member of that class is no better than random chance,
  113. regardless of malice on the part of the designer.
  114. The seed chosen is arbitrary, so was chosen for aesthetic reasons.
  115. It is the 79 bytes of the ASCII representation of a quote by Mahatma
  116. Gandhi:
  117. Whatever you do will be insignificant, but it is very important that you do it.
  118. * Implementation details
  119. Obviously, a program was written to find a prime according to these
  120. rules. To aid anyone who wishes to repeat the search to confirm that
  121. the published primes were indeed generated in this way, here is a
  122. description of how it was done. The primes if the desired form have a
  123. density of about (ln p)^-2. E.g. for 1024-bit p, about one out of
  124. every 503791 numbers meets these criteria, so a considerable amount of
  125. searching is required. The following techniques can make the
  126. computation tolerable.
  127. First, note that q must be odd and not congruent to 0, modulo 3. Thus,
  128. q must be congruent to +/-1, modulo 6. Thus p = 2*q+1 must be
  129. congruent to 2*1+1 = 3 or 2*-1+1 = -1 modulo 12. But p congruent to 3
  130. mod 12 would be divisible by 3, and not prime, so p must be congruent
  131. to 11 mod 12.
  132. Thus, the initial search point t can first be increased until it is
  133. congruent to 11 modulo 12. Searching from this point forward, only
  134. every 12th number, t+12*i, needs to be considered.
  135. If it is desired to choose p so that 2 is a quadratic residue (meaning
  136. that p is congruent to +/-1 modulo 8), then this additional constraint
  137. can be met with no additional difficulty by beginning at the next
  138. number which is congruent to 23 mod 24 and searching in steps of 24.
  139. But in the following discussion, a step size of 12 is assumed.
  140. Then, a sieve is built for trial division by a number of small primes
  141. for a range of following i values. For large primes, a large search
  142. space is required, so a large sieve is desirable. The value used was
  143. 65536 bits (8K bytes). It may be necessary to rebuild the sieve
  144. beginning at t+12*65536 if no suitable prime is found before then, but
  145. this sieve is large enough that the refilling is infrequent and the
  146. overhead is negligible.
  147. Initially, every position in the sieve is marked as a potential prime.
  148. Then, for the small primes s from 5 through 65521, position i in the
  149. sieve is marked as unsuitable if t+12*i is divisble by s, i.e.
  150. definitely not prime. To do this cheaply, consider that t+12*i = 0
  151. (mod s) if i = -12^-1 * t (mod s). So finding t mod s, then 12^-1 (mod
  152. s) and multiplying (mod s) will produce the first i value which is
  153. known to be divisible by s, and then every s positions thereafter in
  154. the sieve will be divisible. This does the equivalent of a great deal
  155. of trial division with minimal effort.
  156. Positions in the sieve are also marked as as unsuitable if (t-1)/2+6*i
  157. = 0 (mod s), because these positions will have (p-1)/2 divisible by s
  158. and thus non-prime. This works similarly, and (t-1)/2 mod s can be
  159. derived from t mod s without actually doing another full division.
  160. This sieve filters out all but 1/591 of the possible values of i as
  161. obviously composite, leaving an expected 852 numbers to be checked by
  162. stronger means before a suitable prime p is found.
  163. After these two sieving operations have removed all numbers from
  164. consideration where p or q = (p-1)/2 have small divisors, the remaining
  165. candidates are subjected to a fast optimized Fermat test, to the base 2,
  166. once for p and once for q. This eliminates, for practical purposes,
  167. all composite numbers.
  168. Special composite numbers can be chosen which pass this test and yet
  169. are not primes - they are called pseudoprimes - but they are so rare in
  170. the ranges considered that the chances of finding one without
  171. deliberate search are utterly negligible. And the stating value for
  172. the search was carefully chosen to have no hidden special properties.
  173. If p and q are found to be prime by this test, some extra confirmation
  174. pseudoprimality tests are performed just to make sure of the conclusion
  175. and p is returned as the result.