• andy's avatar
    [project @ 2000-01-23 09:55:17 by andy] · ee152a67
    andy authored
    GHC now uses the "Hugs" split function, which is believed to have
    better behaviour: it is not unsafe, is deterministic, and works
    better with QuickCheck, a major client.
    
    Rational for the Record, quoted from Mark Jones mail on the Hugs list:
    
    [Mark Jones]
    A couple of months ago, John Hughes sent me mail about a problem that
    he had uncovered with the implementation of the Random library in Hugs.
    He had been using the "split" function in an attempt to generate a
    stream of random number generators, each of which he hoped would be
    different from the others.  But instead he found that he actually
    ended with many different copies of the *same* random number generator.
    A disappointing, and frustratingly non-random result.
    
    If you don't happen to recall, split is a member of the RandomGen class,
    with type RandomGen g => g -> (g,g); it takes a single random number
    generator as its argument, and returns a pair of two new generators as
    its result.  The only thing that the specification requires is that the
    two generators returned are (a) distinct and (b) `independently robust'
    from a statistical point of view.  To the best of my knowledge, the
    implementation in Hugs meets this modest specification.  Sadly, assuming
    only this specification, you cannot write the function that John was
    looking for and be sure that it will generate more than two different
    generators.
    
    For example, the specification allows even the following trivial
    implementation for split:  split _ = (g1, g2), where g1 and g2 are some
    arbitrary but constant pair of distinct, and independently robust
    generators.  With this implementation, you can split as often as you
    want and you'll never get more that two generators.
    
    Hugs and GHC (as far as I can tell) both use definitions of the form:
    
       split g = (g, f g)
    
    for some function f.  (My understanding of the code in GHC is that it
    uses an unsafe function for f, breaking referential transparency; I hope
    the optimizer knows about this.)  Note that this definition returns the
    argument as a result; the specification doesn't prohibit that; all it
    requires is that the two results returned be distinct.  But if you try
    to generate a list of n different generators using:
    
       take n (iterate (fst . split) g)
    
    then you will be sorely disappointed; you might as well have written
    replicate n g.  (On the other hand, if you were lucky enough to have
    used (snd . split), instead of (fst . split), then you wouldn't have
    noticed the problem ...)
    
    I know very little about the mathematics or pragmatics of random
    number generators, so I'm not sure that I know how to fix this
    problem.  However, starting from this position of ignorance, I have
    hacked up a new version of "split" for the standard "StdGen" that
    will appear in the next release of Hugs (real soon now!).  Judging
    from the tests that I've tried so far, it seems to work much
    better than the old version.  That said:
    
     - Take care if you use Random.split in your programs, because it
       may not do what you expect.
    
     - There should probably be an errata about this for the Haskell 98
       library report ... if somebody can figure out what it should say.
    
     - If you use Hugs, be aware that the implementation of Random.split
       was hacked up by someone who has no way of justifying that
       implementation, beyond some simple experiments.
    
     - If you know something about the mechanics of random number
       generators, here's an area where Haskell specifications and
       implementations could benefit from your knowledge!
    
    All the best,
    Mark
    [end quote]
    ee152a67