mkStdGen and split conspire to make some programs predictable
The following program
random.hs is intended to produce a line containing 30 random 0s or 1s. It is not an example of the best way to use System.Random, but it looks innocuous enough.
import Control.Monad import System.Random print0or1 = do g <- newStdGen putStr . show . fst $ randomR (0, 1 :: Int) g main = replicateM_ 30 print0or1 >> putStrLn ""
Let's try running it a thousand times:
rwbarton@functor:/tmp$ ghc-6.10.1 -O2 --make random [1 of 1] Compiling Main ( random.hs, random.o ) Linking random ... rwbarton@functor:/tmp$ for i in `seq 1 1000` ; do ./random >> random.out ; done rwbarton@functor:/tmp$ sort random.out | uniq | wc -l 60
That's odd... there are 2^30^ possible output lines, but when I tried to generate 1000 random ones, I only got 60 distinct outputs. Why did that happen?
One might think this is due to poor initial seeding of the random number generator (due to the time not changing very much during the test), but this is not the case. Attached is a fancier version of the program which reads an initial seed from
/dev/urandom; it exhibits the same behavior.
This phenomenon is not too hard to explain. It is ultimately due to a poor interaction between
split. First, we need to know a bit about the design of System.Random (some statements simplified slightly for this discussion).
- The state of the RNG consists of two
- The initial state produced by mkStdGen almost always has
s2equal to 1. (Extremely rarely, it might have
s2equal to 2. We'll ignore this as it doesn't affect the argument.)
- To generate a random 0 or 1, we first generate a new state using some simple functions
s1' = next1(s1),
s2' = next2(s2). (Note that
s2"evolve" independently.) The random value returned is the lowest bit of
- Splitting the generator
(s1, s2)yields the two generators
Our program functions as follows.
- Initialize the generator stored in
s1is some varying value
- Repeatedly split the generator, replacing it with the first output, and use the second output to generate a 0 or 1.
If we watch
theStdGen while our program runs, we will see that
s1 is incremented by 1 at each step, while
s2 follows the fixed sequence
next2(next2(1)), etc. The 0 or 1 we output at the
kth step is thus the lowest bit of
b,,k,, is some fixed sequence. And as
next1(next1(a+k-1)) turns out to be just an arithmetic sequence with fixed difference modulo a fixed prime so its lowest bits are extremely predictable even without knowing
This issue can be fixed to some extent, without breaking backwards compatibility, by adding another method (besides
mkStdGen) to create a generator, which does not have predictable
s2, and using it to initialize the system RNG.