Bruce Schneier's PRNG is named Yarrow. In honor of the principle of
naming cryptographic PRNG's after plants, and in honor of my favorite
cactus/succulent of the Sonoran desert that Phoenix Arizona (USA)
rests in, I (Eric Lee Green, eric@estinc.com ) hereby name this PRNG
"Ocotillo".

See: http://desertusa.com/nov96/du_ocotillo.html

Mandatory copyright notices: 
 This software is Copyright 1999 Enhanced Software Techologies Inc.
   It is released for public use under a BSD-style Open Source
license. See the file "LICENSE.ocotillo" for the complete license. 

##########################################################################
Mandatory attribution clauses:
   This software includes MD5 routines copyrighted by RSA Data Security,
       Inc. and licensed for public use. 
  This software includes Twofish routines copyrighted by Dr. Brian Gladman
     and licensed for public use. 
##########################################################################

*WARNING* If you are on Linux or FreeBSD, this should be merely an
instructional tool, the /dev/random and /dev/urandom available on
those operating systems is far more secure because, by having access
to OS innards, it has much larger sources of true randomness
("entropy").

If you are on Windows, you should use Bruce Schneier's "Yarrow"
(http://www.counterpane.com) for similar reasons.

This software is intended only for use on Unix platforms such as
Solaris, SCO, or HP/UX that do not have a cryptographically strong
PRNG available (normal pseudo-random number generators produce a
random distribution, but they produce a small number of PREDICTABLE
random distributions, which is not what we need).

##########################################################################
Usage:

int oc_start(void) -- initialize the entropy pools. See ocrandom.c for
   the details of what this involves. Note that you'll need to remove
   '-DBSD_PS' and '-DNO_OUTPUT_POOL' from the makefile for normal use,
   they are there for testing purposes on our internal desktop platforms 
   (we use Linux for our desktop workstations). You will probably want
   to find some more sources of entropy too. 

  Return value: 0 if okay, 1 if error. 

int oc_stir(char *entropy, int numbytes) -- stir some values into the
  input entropy pool. You should do this whenever you can. Some ways to 
  gain stuff to shove into the entropy pool: a) the amount of time between
  network packets coming in to your program. b) the amount of time between
  input keystrokes. c) if your platform has a "/proc/interrupts" file,
  you could use the number of interrupts since your last command. Use your
  imagination! 

  example:
    int time;
    time=lasttime-currtime;
    oc_stir( (char *)&time,sizeof(int));

unsigned char *oc_random(int numbytes): Get a random number. Tell it how
   many bytes you want.

  example:

   int *a;
   a=(int *)oc_random(sizeof(int));
   
   printf("The random number is %d\n",*a);

############################################################################

Entropy pool routines: See ocrandom.h for the pool routines used by
the PRNG. These are currently exported to the global name space but
probably shouldn't be. See ocrandom.c for examples of how to use these
routines. Unless you are playing around with the design of the PRNG,
you probably should not use these routines.

###########################################################################

Building: These routines are intended to be incorporated into your own
program, using your own makefile. Makefile.ocotillo will generate a stand-alone
program that generates 5,000,000 pseudo-random 128-bit numbers in hex in
the file /usr/tmp/output. You can use this to do statistical studies of the 
PRNG's output.

For normal use on 32-bit machines you will not need to modify any
values.  On 64-bit machines you may need to alter md5.h, config.h, or
twofish.h, which have values in them that describe how numbers are laid out.

##########################################################################

Ocotillo can be described as follows:

There are two input entropy pools: the key pool, and the encrypted
value pool. Each pool is 128 bits and is "stirred" via a MD5 one-way
hash.

The key pool is initialized at PRNG startup, grabbing bits of entropy
wherever they can be found. Some places to look: system date/time, 
process execution time (which will vary slightly depending on system load),
the contents of various system log files such as /var/log/messages (on Unix
systems), the output of 'ps', the value of "netstat -i" (which will have
the number of packets that have come in since the 

The encrypted value pool is stirred by the system clock at the time that
network packets come in, by the CPU use at the time that network packets
come in, by the MD5 digest of the current network packet, and any other
sources of entropy that can be gathered (suggestions are in order, but
note that this has to run on systems like Solaris or HP/UX that have
very limited sources of entropy). 

From time to time, when sufficient entropy is judged to have been
gathered, the encrypted value pool is stirred into the key pool. At the
moment I have not implemented that portion, because I have insufficient
sources of entropy to test it.

The key pool and encrypted value pool feed into a TwoFish encryption engine.
The TwoFish engine operates in 128-bit encryption mode. TwoFish was chosen
because the preferred algorithm, RC6(tm), may be covered by U.S. patents. 


Each time more bytes are needed, the current contents of the value
pool are encrypted with the current key to produce a statistically
unpredictable value. In addition, a counter is kept and this counter is 
"stirred" into the value pool every time the core TwoFish routine is called. 

Assumptions:

 1) Sufficient sources of entropy exist to get 128 bits of key data.
 2) Sufficient sources of entropy exist to get 128 bits of
     value data.

This would tend to state that we have 2^256 possible starting states, and
thus 2^256 output sequences. Except that if more entropy arrives between 
generations we have mixed things up even more. 

To add unpredicability, we can also MD5 the output of TwoFish. So thus we get:

                      ______
Input Pool (MD5)----->|    |
                      |TF  |---> Ouput Pool (MD5)
                      |    |
Key Pool (MD5)  ----->|____| 



Attacks and counters:
  
  1) Guessing input states based on output: This is difficult. Even if
TwoFish is cracked, our output has been "stirred" by the one-way hash,
meaning you'll also need to know all previous values that have ever
been sent to the MD5 hash.

  2) Guessing output state based on current inputs to the TwoFish cipher:
 This will work only if you have a record of all previous values fed into
 the MD5 hash on the output, or if you also have a copy of the output pool
 state. 

 3) Predictable initial key value: Always a possibility. I may have
underestimated the entropy available from the various sources that I
hashed into the input pool. However, by constantly changing the input
block value as future entropy arrives, and using an MD5 hash on the output,
this should remove most predictability for the output. 

 4) Predictable stirring of the input pool: if the PRNG is attacked
directly via requesting millions of numbers, it may be possible to
detect statistically significant patterns in the output. This is the
most worrisome possibility. I have done some rudimentary statistical
analysis of the output (thus the 5,000,000 128-bit numbers generated
by the test program), but I am by no means satisfied that there are no
statistically significant patterns in the output under those conditions. 

Questions, comments: Is the TwoFish cipher even necessary as part of
this? Could we not just hash the two input pools directly?

Answer: The original design did not have the MD5 on the
output. Statistical analysis of the outputs shows that the MD5 is not
necessary for maintaining a random distribution of values. It was
added later to deal with the possibility of "cracking" the TwoFish
key.
  The TwoFish is still useful as the source of statistical
randomness. MD5 is a one-way digest hash, and is supposed to be unique
for each possible input sequence, but not necessarily statistically
random.

Please feel free to send any comments and other possible attacks to:

    Eric Lee Green, eric@estinc.com
