* The BigNum multi-precision integer math library

This is a multi-precision math library designed to be very portable,
reasonably clean and easy to use, have very liberal bounds on the sizes
of numbers that can be represented, but above all to perform extremely
fast modular exponentiation.  It has some limitations, such as
representing positive numbers only, and supporting only odd moduli,
which simplify it without impairing this ability.

A second speed goal which has had considerable effort applied to it is
prime number generation.

Finally, while there is probably a long way to go in this direction,
some effort has gone into commenting the code a lot more than seems to
be fashionable among mathematicians.

It is written in C, and should compile on any platform with an ANSI C
compiler and 16 and 32-bit unsigned data types, but various primitives
can be replaced with assembly versions in a great variety of ways for
greater speedup.  See "bnintern.doc" for a description.

In case you're wondering, yes C++ would produce a much nicer syntax
for working with these numbers, but there are a lot of compilers out
there that actually implement ANSI C, and get it almost right.  I have
a few kludges to deal with some that get little things wrong, but
overall it's not too difficult to write code that I can be sure
will work on lots of machines.  And porting it to a K&R C compiler,
if it ever becomes necessary, shouldn't be all *that* difficult.

The C++ compiler world is a less friendly place.  First of all, C++
compilers are still not as common as C compilers, so that hurts
portability right there, and I don't need the extra power to write my
code.  C++ compilers all seem to have important bugs, and different
bugs for each compiler.  First I have to learn all the foibles of a
whole lot of C++ compilers, and then I have write code that uses only
the features that work in all of them.  This is a language not a whole
heck of a lot bigger than C.

(The fact that it drives me *batty* the way that C++ drags *everything*
into the same name space is also a contributing factor.  I *like*
writing "struct" (or "class") before structure names.  I *like* putting
"this->" in front of member references.  It makes it clear to me, when
reading a single line of code, roughly what is being affected by it and
where I can find the relevant source code to find out more.  I've seen
people develop complicated naming conventions to make all this clear,
but the conventions are still very much in flux.)

Anyway...

The main public interface is contained in the file bn.h.  This is
mostly a bunch of pointers to functions which start out uninitialized,
but are set by bnInit() (which is called by bnBegin()).

All of the public routines have names of the bnFunction variety.
Some internal routines are lbnFunction, but you should never have to
worry about those unless you're hacking with the code.

The code uses the assert() macro a lot internally.  If you do something
you're not supposed to, you'll generally notice because an assert()
will fail.  The library does not have special error codes for division
by zero or the like - it assert fails instead.  Just don't do that.

A BigNum is represented by a struct BigNum, which really doesn't
need to be understood, but it often makes me feel better to understand
what's going on, so here it is:

#> struct BigNum {
#>	void *ptr;
#>	unsigned size;	/* Note: in (variable-sized) words */
#>	unsigned allocated;
#> };

The pointer points to the least-significant end of an array of words which
hold the number.  The array contains "allocated" words, but only "size"
of them are actually meaningful.  The others may have any value.
This is all of limited use because the size of a word is not specified.
In fact, it can change at run time - if you run on an 8086 one day and an
80386 the next, you may find the word size different.

* Initialization

The user of the library is responsible for allocating and freeing each
struct BigNum.  Usually they're just local variables.  All the library
functions take pointers to them.  The first thing you need to do is
initialize all the fields to empty, a zero-valued BigNum.  This is done
with the function bnBegin:
#> void bnBegin(struct BigNum *bn);

When you're done with a BigNum, call bnEnd to deallocate the data storage
in preparation for deallocating the structure:
#> void bnEnd(struct BigNum *bn);

This resets the number to the 0 state.  You can actually start using the
number right away again, or call bnEnd again, so if you're really
memory-conscious you might want to use this to free a large
number you're done with this way before going on to use the buffer
for smaller things.

A simple assignment can be done with bnCopy.  
#> int bnCopy(struct BigNum *dest, struct BigNum const *src);

This sets dest = src, and returns an error code.  Most functions in the
library do this, and return 0 on success and -1 if they were unable to
allocate needed memory.  If you're lazy and sure you'll never run out
of memory, you can avoid checking this, but it's better to be
paranoid.  If a function returns -1, the what has happened to the
destination values is undefined.  They're usually unmodified, and
they're always still valid BigNum numbers, but their values might be
strange.

In general, anywhere that follows, unless otherwise documented, assume
that an "int" return value is 0 for success or -1 for error.

A trivial little function which is sometimes handy, and quite cheap to
execute (it just swaps the pointers) is:
#> void bnSwap(struct BigNum *a, struct BigNum *b);

* Input and output

For now, the library only works with numbers in binary form - there's
no way to get decimal numbers into or out of it.  But it's pretty
flexible on how it does that.

The first function just sets a BigNum to have a small value.  There are
several such "quick" forms which work with "small" second operads.
"Small" is defined as less than 65536, the minimum 16-bit word size
supported by the library.  The limit applies even if unsigned is larger
or the library is compiled for a larger word size.
#> int bnSetQ(struct BigNum *dest, unsigned src);

This returns the usual -1 error if it couldn't allocate memory.

There's also a function to determine the size of a BigNum, in bits.
The size is the number of bits required to represent the number,
0 if the number is 0, and floor(log2(src)) + 1 otherwise.  E.g. 1 is
the only 1-bit number, 2 and 3 are 2-bit numbers, etc.
#> unsigned bnBits(struct BigNum const *src);

If bnBits(src) <= 16, you can get the whole number with this function.
If it's larger, you get the low k bits, where k is at least 16.
(This doesn't bother masking if it's easy to return more, but you
shouldn't rely on it.)  Even that is useful for many things, like
deciding if a number is even or odd.
#> unsigned bnLSWord(struct BigNum const *src);

For larger numbers, the format used by the library is an array of
unsigned 8-bit bytes.  These bytes may be in big-endian or little-endian
order, and it's possible to examine or change just part of a number.
The functions are:
#> void bnExtractBigBytes(struct BigNum const *bn, unsigned char *dest,
#> 	unsigned lsbyte, unsigned len);
#> int bnInsertBigBytes(struct BigNum *bn, unsigned char const *src,
#> 	unsigned lsbyte, unsigned len);
#> void bnExtractLittleBytes(struct BigNum const *bn, unsigned char *dest,
#> 	unsigned lsbyte, unsigned len);
#> int bnInsertLittleBytes(struct BigNum *bn, unsigned char const *src,
#> 	unsigned lsbyte, unsigned len);

These move bytes between the BigNum and the buffer of 8-bit bytes.  The
Insert functions can allocate memory, so return an error code.  The
Extract functions always succeed.

The buffer is encoded in base 256, with either the most significant
byte (the Big functions) or the least significant byte (the Little
functions) coming first.  "len" is the length of the buffer, so the
buffer always encodes a value between 0 and 256^len.  (That's
"to the power of", not "xor".)

"lsbyte" gives the offset into the BigNum which is being worked with.
This is usually zero, but you can, for example, read out a large
BigNum in 32-byte chunks, using a len of 32 and an lsbyte of 0, 32,
64, 96, etc.

After these complete, the number encoded in the buffer will be
equal to (bn / 256^lsbyte) % 256^len.  The only difference between
Insert and Extract is which is changed to match the other.

* Simple math

#> int bnAdd(struct BigNum *dest, struct BigNum const *src);
#> int bnAddQ(struct BigNum *dest, unsigned src);

These add dest += src.  In the Q form, as mentioned above with bnSetQ,
src must be < 65536.  In either case, the functions can fail and return
-1, as usual.

#> int bnSub(struct BigNum *dest, struct BigNum const *src);
#> int bnSubQ(struct BigNum *dest, unsigned src);

These subtract dest -= src.  If this would make the result negative,
dest is set to (src-dest) and a value of 1 is returned, so you can
keep track of a separate sign if you need to.  Otherwise, they return
0 on success and -1 if they were unable to allocate needed memory.

To make your life simpler if you are error checking, these four functions
are guaranteed not to allocate memory unnecessarily.  So if you know
that the addition or subtraction you're doing won't produce a result
larger than the input, and won't underflow either (like subtracting 1
from an odd number or adding 1 to an even number), you can skip checking
the error code.

#> extern int (*bnCmp)(struct BigNum const *a, struct BigNum const *b);
#> extern int (*bnCmpQ)(struct BigNum const *a, unsigned b);

This returns the sign (-1, 0 or +1) of a-b.  Another way of saying
this is that a <=> b is the same as bnCmp(a, b) <=> 0, where "<=>"
stands for one of <, <=, =, !=, >= or >.  The bnCmpQ form is the same,
but (as in all the Q functions) the second argument is a number < 65536.

#> int bnSquare(struct BigNum *dest, struct BigNum const *src);

This computes dest = src^2, returning an error if it ran out of memory.
If you care about performance tuning, this slows down when dest and
src are the same BigNum, since it needs to allocate a temporary buffer
to do the work in.  It does work, however.

#> int bnMul(struct BigNum *dest, struct BigNum const *a,
#>	struct BigNum const *b);
#> int bnMulQ(struct BigNum *dest, struct BigNum const *a, unsigned b);

These compute dest = a * b, and work in the same way as bnSquare.
(Including the fact that it's faster if dest is not the same as any of
the inputs.)  bnSquare is faster if a and b are the same.  The second
input operand to bnMulQ must be < 65536, like all the "Q" functions.

#> int bnDivMod(struct BigNum *q, struct BigNum *r,
#>	struct BigNum const *n, struct BigNum const *d);

This computes division with remainder, q = n/d and r = n%d.  Don't
pass in a zero d; it will blow up.  In general, all of the values
must be different (it will blow up if you try), but r and n may be the
same.

RE-ENTRANCY NOTE: This temporarily modifies the BigNum "d" internally,
although it restores it before returning.  If you're doing something
multi-threaded, you can't share the d value between threads, even though
it says "const".  That's a safe assumption elsewhere, but this is an
exception.

That note also means that it's not safe to let n be the same as d,
although that's such a stupid way to set q to 1 and r to 0 that
I don't think it's worth worrying about.  (I hope you understand that
this doesn't mean that n and d can't have the same numerical value,
just that they can't both point to the same struct BigNum.)

#> int bnMod(struct BigNum *dest, struct BigNum const *src,
#> 	struct BigNum const *d);

This works just the same as the above, but doesn't bother you with the
quotient.  (No, there's no function that doesn't bother you with the
remainder.)  Again, dest and src may be the same (it's actually
more efficient if they are), but d may not be the same as either.

#> unsigned int bnModQ(struct BigNum const *src, unsigned d);

This also computes src % d, but does so for small (up to 65535,
the usual limit on "Q" functions) values of d.  It returns the
remainder.  (No error is possible.)

* Advanced math

#> int bnLShift(struct BigNum *dest, unsigned amt);
#> void bnRShift(struct BigNum *dest, unsigned amt);

These shift the given bignum left or right by "amt" bit positions.
Left shifts multiply by 2^amt, and may have to allocate memory
(and thus fail).  Right shifts divide by 2^amt, throwing away the
remainder, and can never fail.

#> unsigned bnMakeOdd(struct BigNum *n);

This right shifts the input number as many places as possible without
throwing anything away, and returns the number of bits shifted.
If you see "let n = s * 2^t, where s is odd" in an algorithm,
this is the function to call.  It modifies n in place to produce s
and returns t.

This returns 0 if you pass it 0.

#> int bnExpMod(struct BigNum *result, struct BigNum const *n,
#> 	struct BigNum const *exp, struct BigNum const *mod);

Ah, now we get to the heart of the library - probably the most heavily
optimized function in it.  This computes result = n^exp, modulo "mod".
result may be the same as n, but not the same as exp or mod.  For large
exponents and moduli, it can try to allocate quite a bit of working
storage, although it will manage to finish its work (just slower)
if some of those allocations fail.  (Not all, though - the first few
are essential.)

"mod" must be odd.  It will blow up if not.  Also, n must be less than
mod.  If you're not sure if it is, use bnMod first.  The return value
is always between 0 and mod-1.

#> int bnTwoExpMod(struct BigNum *result, struct BigNum const *exp,
#>	struct BigNum const *mod);

This computes result = 2^exp, modulo "mod".  It's faster than the general
bnExpMod function, although that function checks to see if n = 2 and calls
this one internally, so you don't need to check yourself if you're not
sure.  The main reason to mention this is that if you're doing something
like a pseudoprimality test, using a base of 2 first can save some time.

#> int bnDoubleExpMod(struct BigNum *result,
#> 	struct BigNum const *n1, struct BigNum const *e1,
#> 	struct BigNum const *n2, struct BigNum const *e2,
#> 	struct BigNum const *mod);

This computes dest = n1^e1 * n2^e2, modulo "mod".  It does it quite
a bit faster than doing two separate bnExpMod operations; in fact,
it's not that much more expensive than one.  "result" may be the
same BigNum as n1 or n2, but it may not be the same as the exponents
or the modulus.  All of the other caveats about bnExpMod apply.

#> int bnGcd(struct BigNum *dest, struct BigNum const *a,
#>	struct BigNum const *b);

This returns dest = gcd(a,b).  dest may be the same as either input.

/* dest = src^-1, modulo "mod".  dest may be the same as src. */
#> int bnInv(struct BigNum *dest, struct BigNum const *src,
#>	struct BigNum const *mod);

This requires that gcd(src, mod) = 1, and returns dest = src^-1, modulo
"mod".  That is, 0 < dest < mod and dest*src = 1, modulo "mod".
dest and src may be the same, but mod must be different.

This will probably get extended at some point to find dest such that
dest * src = gcd(src, mod), modulo "mod", but that isn't implemented
yet.

* Auxiliary functions

These mostly-internal functions aren't very useful to call directly,
and might even get removed, but for now they're there in the unusual
case where you might want them.

#> void bnInit(void);

This does global library initialization.  It is called by the first
call to bnBegin(), so you shouldn't need to call it explicitly.  It is
idempotent, so you can call it multiple times if you like.  The only
thing it does right now is set up the function pointers to the rest of
the library.  If a program crashes and the debugger tells you that
it's trying to execute at address 0, bnInit never got called.

#> int bnPrealloc(struct BigNum *bn, unsigned bits);

This preallocates space in bn to make sure that it can hold "bits" bits.
If the overflow characteristics of various algorithms get documented
better, this might allow even more error-checking to be avoided, but
for now it's only to reduce memory fragmentation.

#> void bnNorm(struct BigNum *bn);

This decreases the "size" field of the given bignum until it has no leading
zero words in its internal representation.  Given that almost everything
in the library does the equivalent of this on input and output, the utility
of this function is a bit dubious.  It's kind of a legacy.

* Extra libraries

There are a number of utilities built on top of the basic library.
They are built on top of the interfaces just described, and can be used
if you like.

* jacobi.h

#> int bnJacobiQ(unsigned p, struct BigNum const *bn);

This returns the Jacobi symbol J(p,bn), where p is a small number.
The Jacobi symbol is always -1, 0, or +1.  You'll note that p may
only be positive, even though the Jacobi symbol is defined for
negative p.  If you want to worry about negative p, do it yourself.
J(-p,bn) = (bnLSWord(bn) & 2 ? -1 : +1) * bnJacobiQ(p, bn).

A function to compute the Jacobi symbol for large p would be nice.

* prime.h

#> int primeGen(struct BigNum *bn, unsigned (*rand)(unsigned),
#> 	int (*f)(void *arg, int c), void *arg, unsigned exponent, ...);

This finds the next prime p >= bn, and sets bn to equal it.
Well, sort of.

It always leaves bn at least as large as when it started (unless it
runs out of memory and returns -1), and if you pass a 0 for the rand
function, it will be the next prime >= bn.

Except:
- It doesn't bother coping with small primes.  If it's divisible by any
prime up to 65521, it's considered non-prime.  Even if the quotient is 0.
If you pass in "1", expecting to get "2" back, you'll get 65537.  Maybe
it would be nice to fix that.
- It actually only does a few strong pseudoprimality tests to fixed
bases to determine if the candidate number is prime.  For random input,
this is fine; the chance of error is so infinitesimal that it is
absolutely not worth worrying about.  But if you give it numbers carefully
chosen to be strong pseudoprimes, it will think they're primes and not
complain.  For example, 341550071728321 = 10670053 * 32010157 will
pass the primality test quite handily.  So will
68528663395046912244223605902738356719751082784386681071.
- If you supply a rand() function, which returns 0 <= rand(n) < n
(n never gets very large - currently, at most 256), this shuffles the
candidates before testing and accepting one.  If you want a "random"
prime, this produces a more uniformly distributed prime, while
retaining all of the speed advantages of a sequential search from a
random starting point, which would otherwise produce a bias towards
primes which were not closely preceded by other primes.  So, for
example, the second of a pair of twin primes would be very unlikely to
be chosen.  rand() doesn't totally flatten the distribution, but it
comes very close.

The "f" function is called periodically during the progress of the
search (which can take a while) with the supplied argument (for private
context) and a character c, which sort of tells you what it's doing.
c is either '.' or '*' (if it's found something and is confirming that
it's really prime) or '/' (if it's having a really hard time finding
something).  Also, if f returns < 0, primeGen immediately returns that
value.  This can form the basis for a user interface which can show some
life occasionally and abort the computation if desired.

If you just print these characters to the screen, don't forget to
fflush() after printing them.

Finally, "exponent, ..." is a zero-terminated list of small numbers
which must not divide p-1 when the function returns.  If the numbers
are chosen to be the prime factors of n, then gcd(n, p-1) will be
1, so the map f(x) -> x^n is invertible modulo p.

#> int primeGenStrong(struct BigNum *bn, struct BigNum const *step,
#>	int (*f)(void *arg, int c), void *arg);

This is similar, but searches in steps of "step", rather than 1, from the
given starting value.  The starting value must be odd and the step
size must be even!  If you start with bn == 1 (mod step), and step
is 2*q, where q is a large prime, then this generates "strong" primes,
p-1 having a large prime factor q.  There are other uses, too.

#ifdef __cplusplus
}
#endif

* germain.h

#> int germainPrimeGen(struct BigNum *bn, int (*f)(void *arg, int c),
#>	void *arg);

This increases bn until it is a Sophie Germain prime, that is, a number p
such that p and (p-1)/2 are both prime.  These numbers are rarer than
ordinary primes and the search takes correspondingly longer.

It omits the randomization portion of primeGen, and the exponent list,
since the factors of bn-1 are known already.  The f function for
progress is the same, but it is also sometimes passed a '+' or '-'
character when it's found a (p-1)/2 that's prime.  This is just to lend
some interest to an otherwise very boring row of dots.  Finding large
primes with this function, even though it's pretty optimized, takes a
*while*, and otherwise once the screen filled with dots (one every few
seconds) it would be hard to keep track of the scroll.

It varies a lot, depending on luck of the starting value and the speed
of your machine, but if your starting number is over 1024 bits, plan on
over an hour of run time, and if it's over 2048 bits, plan on a day.
At 4096 bits, start thinking about a week.

Past that, supporting checkpoint/restart is a good idea.  Every time
the progress function gets a '/' is probably a good interval, and when
it happens have f return a distinct error value like -2.  When
germainPrimeGen returns with that value, save the value in bn to a file
somewhere and call it again with the same bn to continue searching.

* sieve.h

This is the sieving code that the other prime-finding functions call
to do trial division.  You might use it if you are doing some magic
prime-finding of your own.  A sieve is an array of bits, stored
little-endian in an array of bytes (i.e. the lsb of byte 0 is bit 0).
Sieves are indexed with the "unsigned" data type, so should not, for
portability, be larger than 65536/8 = 8192 bytes long.

A 1 bit is considered "in" the sieve, it has passed all the sieving.
A 0 bit has been removed by some step.

The functions are:

#> void sieveSingle(unsigned char *array, unsigned size, unsigned start,
#>	unsigned step);

This (efficiently) clears the bits at positions start, start+step,
start+2*step, etc. in the sieve given by array and size.  This is the
elementary sieve-building step.  Start with a sieve of all 1s, and
apply this as required.

#> unsigned sieveSearch(unsigned char const *array, unsigned size,
#>	unsigned start);

This returns the next bit position *greater than* start which is set
in the indicated sieve, or 0 on failure.  NOTE that this means that
you have to look at the bit at position 0 (array[0] & 1) by yourself
if you want to pay attention to it, because there's no way to tell
sieveSearch to start searching at 0 - it starts at start+1.

#> int sieveBuild(unsigned char *array, unsigned size, struct BigNum const *bn,
#>	unsigned step, unsigned dbl);

This initializes a sieve where, if bit i is set, then bn+step*i is not
divisible by any small primes.  (Small is from 2 through 65521, the
largest prime less that 65536.)  If "dbl" is > 0, then bits are also
cleared if 2*(bn+step*i)+1 is divisible.  If dbl > 1, then
4*(bn+step*i)+3 is also checked, and so on.  This feature is used when
generating Sohpie Germain primes.

Usually, you use a step of 2.

#> int sieveBuildBig(unsigned char *array, unsigned size,
#>	struct BigNum const *bn, struct BigNum const *step, unsigned dbl);

This is just the same, but accepts a BigNum step size, and is correspondingly
slower.

* bnprint.h

#> int bnPrint(FILE *f, char const *prefix, struct BigNum const *bn,
#>	char const *suffix);

This prints a nicely-formatted BigNum in hexadecimal form to the given
FILE *.  The "prefix" is printed before it, as a prompt, and the
"suffix" is printed afterwards.  The BigNum itself is printed in
64-character lines, broken with a trailing backslash if necessary.
Continuation lines are indented by the length of the prefix.

E.g. a 2^512-1, printed with the call bnPrint(stdout, "a = (", bn, ")\n")
would result in:

a = (FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF\
     FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF)

Hex digits are printed in upper case to facilitate cutting and pasting into
the Unix "dc" utility.