bnintern.doc 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304
  1. * The organization of the BigNum Library
  2. As mentioned in bn.doc, the library should compile on anything with an
  3. ANSI C compiler and 16 and 32-bit data types. (Non-power-of-2 word
  4. lengths probably wouldn't be *too* hard, but the matter is likely to
  5. remain academic.) However, assembly subroutines can be added in a
  6. great variety of ways to speed up computations.
  7. It's even possible to vary the word length dynamically at run time.
  8. Currently, 80x86 and 680x0 assembly primitives have been written in 16
  9. and 32-bit forms, as not all members of these families support 32x32->64
  10. bit multiply. In future, 32/64 bit routines may be nice for the MIPS
  11. and PowerPC processors. (The SPARC has a 64-bit extension, but it still
  12. only produces a maximum 64-bit multiply result. The MIPS, PowerPC and
  13. Alpha give access to 128 bits of product.)
  14. The way that this works is that the file bn.c declares a big pile of
  15. function pointers, and the first bnInit() call figures out which set
  16. of functions to point these to. The functions are named so that
  17. it is possible to link several sets into the same executable without
  18. collisions.
  19. The library can store numbers in big-endian or little-endian word order,
  20. although the order of bytes within a word is always the platform native
  21. order. As long as you're using the pure C version, you can compile
  22. independent of the native byte ordering, but the flexibility is available
  23. in case assembly primitives are easier to write one way or the other.
  24. (In the absence of other considerations, little-endian is somewhat more
  25. efficient, and is the default. This is controlled by BN_XXX_ENDIAN.)
  26. In fact, it would be possible to change the word order at run time,
  27. except that there is no naming convention to support linking in
  28. functions that differ only in endianness. (Which is because the
  29. point of doing so is unclear.)
  30. The core of the library is in the files lbn??.c and bn??.c, where "??"
  31. is 16, 32, or 64. The 32 and 64-bit files are generated from the 16-bit
  32. version by a simple textual substitution. The 16-bit files are generally
  33. considered the master source, and the others generated from it with sed.
  34. Usually, only one set of these files is used on any given platform,
  35. but if you want multiple word sizes, you include one for each supported
  36. word size. The files bninit??.c define a bnInit function for a given
  37. word size, which calls bnInit_??() internally. Only one of these may
  38. be included at a time, and multiple word sizes are handled by a more
  39. complex bnInit function such as the ones in bn8086.c and bn68000.c,
  40. which determine the word size of the processor they're running on and
  41. call the appropriate bnInit_??() function.
  42. The file lbn.h uses <limits.h> to find the platform's available data
  43. types. The types are defined both as macros (BNWORD32) and as typedefs
  44. (bnword32) which aren't used anywhere but can come in very handy when
  45. using a debugger (which doesn't know about macros). Any of these may
  46. be overridden either on the compiler command line (cc -DBN_BIG_ENDIAN
  47. -DBNWORD32="unsigned long"), or from an extra include file BNINCLUDE
  48. defined on the command line. (cc -DBNINCLUDE=lbnmagic.h) This is the
  49. preferred way to specify assembly primitives.
  50. So, for example, to build a 68020 version of the library, compile the
  51. 32-bit library with -DBNINCLUDE=lbn68020.h, and compile and link in
  52. lbn68020.c (which is actually an assembly source file, if you look).
  53. Both 16- and 32-bit 80x86 code is included in lbn8086.h and .asm. That
  54. code uses 16-bit large-model addressing. lbn80386.h and .asm use 32-bit
  55. flat-model addressing.
  56. Three particularly heavily used macros defined by lbn.h are BIG(x),
  57. LITTLE(y) and BIGLITTLE(x,y). These expand to x (or nothing) on
  58. a big-endian system, and y (or nothing) on a little-endian system.
  59. These are used to conditionalize the rest of the code without taking
  60. up entire lines to say "#ifdef BN_BIG_ENDIAN", "#else" and "#endif".
  61. * The lbn??.c files
  62. The lbn?? file contains the low-level bignum functions. These universally
  63. expect their numbers to be passed to them in (buffer, length) form and
  64. do not attempt to extend the buffers. (In some cases, they do allocate
  65. temporary buffers.) The buffer pointer points to the least-significant
  66. end of the buffer. If the machine uses big-endian word ordering, that
  67. is a pointer to the end of the buffer. This is motivated by considering
  68. pointers to point to the boundaries between words (or bytes). If you
  69. consider a pointer to point to a word rather than between words, the
  70. pointer in the big-endian case points to the first word past the end of the
  71. buffer.
  72. All of the primitives have names of the form lbnAddN_16, where the
  73. _16 is the word size. All are surrounded by "#ifndef lbnAddN_16".
  74. If you #define lbnAddN_16 previously (either on the command like or
  75. in the BNINCLUDE file), the C code will neither define *nor declare* the
  76. corresponding function. The declaration must be suppressed in case you
  77. declare it in a magic way with special calling attributes or define it as
  78. a macro.
  79. If you wish to write an assembly primitive, lbnMulAdd1_??, which
  80. multiplies N words by 1 word and adds the result to N words, returning
  81. the carry word, is by FAR the most important function - almost all of
  82. the time spent performing a modular exponentiation is spent in this
  83. function. lbnMulSub1_??, which does the same but subtracts the product
  84. and returns a word of borrow, is used heavily in the division routine
  85. and thus by GCD and modular inverse computation.
  86. These two functions are the only functions which *require* some sort
  87. of double-word data type, so if you define them in assembly language,
  88. the ?? may be the widest word your C compiler supports; otherwise, you
  89. must limit your implementation to half of the maximum word size. Other
  90. functions will, however, use a double-word data type if available.
  91. Actually, there are some even simpler primitives which you can provide
  92. to allow double-width multiplication: mul??_ppmm, mul??_ppmma and
  93. mul??_ppmmaa These are expected to be defined as macros (all arguments
  94. are always side-effect-free lvalues), and must return two words of result
  95. of the computation m1*m2 + a1 + a2. It is best to define all three,
  96. although any that are not defined will be generated from the others in
  97. the obvious way. GCC's inline assembler can be used to define these.
  98. (The names are borrowed from the GNU MP package.)
  99. There is also lbnMulN1_??, which stores the result rather than adding or
  100. subtracting it, but it is less critical. If it is not provided, but
  101. lbnMulAdd1_?? is, it will be implemented in terms of lbnMulAdd1_?? in the
  102. obvious way.
  103. lbnDiv21_??, which divides two words by one word and returns a quotient
  104. and remainder, is greatly sped up by a double-word data type, macro
  105. definition, or assembly implementation, but has a version which will run
  106. without one. If your platform has a double/single divide with remainder,
  107. it would help to define this, and it's quite simple.
  108. lbnModQ_?? (return a multi-precision number reduced modulo a "quick"
  109. (< 65536) modulus is used heavily by prime generation for trial division,
  110. but is otherwise little used.
  111. Other primitives may be implemented depending on the expected usage mix.
  112. It is generally not worth implementing lbnAddN_?? and lbnSubN_?? unless
  113. you want to start learning to write assembly primitives on something
  114. simple; they just aren't used very much. (Of course, if you do, you'll
  115. probably get some improvements, in both speed and object code size, so
  116. it's worth keeping them in, once written.)
  117. * The bn??.c files
  118. While the lbn??.c files deal in words, the bn??.c files provide the
  119. public interface to the library and deal in bignum structures. These
  120. contain a buffer pointer, an allocated length, and a used length.
  121. The lengths are specified in words, but as long as the user doesn't go
  122. prying into such innards, all of the different word-size libraries
  123. provide the same interface; they may be exchanged at link time, or even
  124. at run time.
  125. The bn.c file defines a large collection of function pointers and one
  126. function, bnInit. bnInit is responsible for setting the function pointers
  127. to point to the appropriate bn??.c functions. Each bn??.c file
  128. provides a bnInit_?? function which sets itself up; it is the job
  129. of bnInit to figure out which word size to use and call the appropriate
  130. bnInit_?? function.
  131. If only one word size is in use, you may link in the file bninit??.c,
  132. which provides a trivial bnInit function. If multiple word sizes are
  133. in use, you must provide the appropriate bnInit function. See
  134. bn8086.c as an example.
  135. For maximum portability, you may just compile and link in the files
  136. lbn00.c, bn00.c and bninit00.c, which determine, using the preprocessor
  137. at compile time, the best word size to use. (The logic is actually
  138. located in the file bnsize00.h, so that the three .c files cannot get out
  139. of sync.)
  140. The bignum buffers are allocated using the memory management routines in
  141. lbnmem.c. These are word-size independent; they expect byte counts and
  142. expect the system malloc() to return suitably aligned buffers. The
  143. main reason for this wrapper layer is to support any customized allocators
  144. that the user might want to provide.
  145. * Other bn*.c files
  146. bnprint.c is a simple routine for printing a bignum in hex. It is
  147. provided in a separate file so that its calls to stdio can be eliminated
  148. from the link process if the capability is not needed.
  149. bntest??.c is a very useful regression test if you're implementing
  150. assembly primitives. If it doesn't complain, you've probably
  151. got it right. It also does timing tests so you can see the effects
  152. of any changes.
  153. * Other files
  154. sieve.c contains some primitives which use the bignum library to perform
  155. sieving (trial division) of ranges of numbers looking for candidate primes.
  156. This involves two steps: using a sieve of Eratosthenes to generate the
  157. primes up to 65536, and using that to do trial division on a range of
  158. numbers following a larger input number. Note that this is designed
  159. for large numbers, greater than 65536, since there is no check to see
  160. if the input is one of the small primes; if it is divisible, it is assumed
  161. composite.
  162. prime.c uses sieve.c to generate primes. It uses sieve.c to eliminate
  163. numbers with trivial divisors, then does strong pseudoprimality tests
  164. with some small bases. (Actually, the first test, to the base 2, is
  165. optimized a bit to be faster when it fails, which is the common case,
  166. but 1/8 of the time it's not a strong pseudoprimality test, so an extra,
  167. strong, test is done in that case.)
  168. It prints progress indicators as it searches. The algorithm
  169. searches a range of numbers starting at a given prime, but it does
  170. so in a "shuffled" order, inspired by algorithm M from Knuth. (The
  171. random number generator to use for this is passed in; if no function
  172. is given, the numbers are searched in sequential order and the
  173. returns value will be the next prime >= the input value.)
  174. germain.c operates similarly, but generates Sophie Germain primes;
  175. that is, primes p such that (p-1)/2 is also prime. It lacks the
  176. shuffling feature - searching is always sequential.
  177. jacobi.c computes the Jacobi symbol between a small integer and a BigNum.
  178. It's currently only ever used in germain.c.
  179. * Sources
  180. Obviously, a key source of information was Knuth, Volume 2,
  181. particularly on division algorithms.
  182. The greatest inspiration, however, was Arjen Lenstra's LIP
  183. (Large Integer Package), distributed with the RSA-129 effort.
  184. While very difficult to read (there is no internal documentation on
  185. sometimes very subtle algorithms), it showed me many useful tricks,
  186. notably the windowed exponentiation algorithm that saves so many
  187. multiplies. If you need a more general-purpose large-integer package,
  188. with only a minor speed penalty, the LIP package is almost certainly
  189. the best available. It implements a great range of efficient
  190. algorithms.
  191. The second most important source was Torbjorn Granlund's gmp
  192. (GNU multi-precision) library. A number of C coding tricks were
  193. adapted from there. I'd like to thank Torbjorn for some useful
  194. discussions and letting me see his development work on GMP 2.0.
  195. Antoon Bosselaers, Rene' Govaerts and Joos Vandewalle, in their CRYPTO
  196. '93 paper, "Comparison of three modular reduction functions", brought
  197. Montgomery reduction to my attention, for which I am grateful.
  198. Burt Kaliski's article in the September 1993 Dr. Dobb's Journal,
  199. "The Z80180 and Big-number Arithmetic" pointed out the advantages (and
  200. terminology) of product scanning to me, although the limited
  201. experiments I've done have shown no improvement from trying it in C.
  202. Hans Reisel's book, "Prime Numbers and Computer Methods for Factorization"
  203. was of great help in designing the prime testing, although some of
  204. the code in the book, notably the Jacobi function in Appendix 3,
  205. is an impressive example of why GOTO should be considered harmful.
  206. Papers by R. G. E. Pinch and others in Mathematics of Computation were
  207. also very useful.
  208. Keith Geddes, Stephen Czapor and George Labahn's book "Algorithms
  209. for Computer Algebra", although it's mostly about polynomials,
  210. has some useful multi-precision math examples.
  211. Philip Zimmermann's mpi (multi-precision integer) library suggested
  212. storing the numbers in native byte order to facilitate assembly
  213. subroutines, although the core modular multiplication algorithms are
  214. so confusing that I still don't understand them. His boasting about
  215. the speed of his library (albeit in 1986, before any of the above were
  216. available for study) also inspired me to particular effort to soundly
  217. beat it. It also provoked a strong reaction from me against fixed
  218. buffer sizes, and complaints about its implementation from Paul Leyland
  219. (interface) and Robert Silverman (prime searching) contributed usefully
  220. to the design of this current library.
  221. I'd like to credit all of the above, plus the Berkeley MP package, with
  222. giving me difficulty finding a short, unique distinguishing prefix for
  223. my library's functions. (I have just, sigh, discovered that Eric Young
  224. is using the same prefix for *his* library, although with the
  225. bn_function_name convention as opposed to the bnFunctionName one.)
  226. I'd like to thank the original implementor of Unix "dc" and "factor"
  227. for providing useful tools for verifying the correct operation of
  228. my library.
  229. * Future
  230. - Obviously, assembly-language subroutines for more platforms would
  231. always be nice.
  232. - There's a special case in the division for a two-word denominator
  233. which should be completed.
  234. - When the quotient of a division is big enough, compute an inverse of
  235. the high word of the denominator and use multiplication by that
  236. to do the divide.
  237. - A more efficient GCD algorithm would be nice to have.
  238. - More efficient modular inversion is possible. Do it.
  239. - Extend modular inversion to deal with non-relatively-prime
  240. inputs. Produce y = inv(x,m) with y * x == gcd(x,m) mod m.
  241. - Try some product scanning in assembly.
  242. - Karatsuba's multiplication and squaring speedups would be nice.
  243. - I *don't* think that FFT-based algorithms are worth implementing yet,
  244. but it's worth a little bit of study to make sure.
  245. - More general support for numbers in Montgomery form, so they can
  246. be used by more than the bowels of lbnExpMod.
  247. - Provide an lbnExpMod optimized for small arguments > 2, using
  248. conventional (or even Barrett) reduction of the multiplies, and
  249. Montgomery reduction of the squarings.
  250. - Adding a Lucas-based prime test would be a real coup, although it's
  251. hard to give rational reasons why it's necessary. I have a number of
  252. ideas on this already. Find out if norm-1 (which is faster to
  253. compute) suffices.
  254. - Split up the source code more to support linking with smaller subsets
  255. of the library.