### [project @ 2000-01-23 09:55:17 by andy]

GHC now uses the "Hugs" split function, which is believed to have better behaviour: it is not unsafe, is deterministic, and works better with QuickCheck, a major client. Rational for the Record, quoted from Mark Jones mail on the Hugs list: [Mark Jones] A couple of months ago, John Hughes sent me mail about a problem that he had uncovered with the implementation of the Random library in Hugs. He had been using the "split" function in an attempt to generate a stream of random number generators, each of which he hoped would be different from the others. But instead he found that he actually ended with many different copies of the *same* random number generator. A disappointing, and frustratingly non-random result. If you don't happen to recall, split is a member of the RandomGen class, with type RandomGen g => g -> (g,g); it takes a single random number generator as its argument, and returns a pair of two new generators as its result. The only thing that the specification requires is that the two generators returned are (a) distinct and (b) `independently robust' from a statistical point of view. To the best of my knowledge, the implementation in Hugs meets this modest specification. Sadly, assuming only this specification, you cannot write the function that John was looking for and be sure that it will generate more than two different generators. For example, the specification allows even the following trivial implementation for split: split _ = (g1, g2), where g1 and g2 are some arbitrary but constant pair of distinct, and independently robust generators. With this implementation, you can split as often as you want and you'll never get more that two generators. Hugs and GHC (as far as I can tell) both use definitions of the form: split g = (g, f g) for some function f. (My understanding of the code in GHC is that it uses an unsafe function for f, breaking referential transparency; I hope the optimizer knows about this.) Note that this definition returns the argument as a result; the specification doesn't prohibit that; all it requires is that the two results returned be distinct. But if you try to generate a list of n different generators using: take n (iterate (fst . split) g) then you will be sorely disappointed; you might as well have written replicate n g. (On the other hand, if you were lucky enough to have used (snd . split), instead of (fst . split), then you wouldn't have noticed the problem ...) I know very little about the mathematics or pragmatics of random number generators, so I'm not sure that I know how to fix this problem. However, starting from this position of ignorance, I have hacked up a new version of "split" for the standard "StdGen" that will appear in the next release of Hugs (real soon now!). Judging from the tests that I've tried so far, it seems to work much better than the old version. That said: - Take care if you use Random.split in your programs, because it may not do what you expect. - There should probably be an errata about this for the Haskell 98 library report ... if somebody can figure out what it should say. - If you use Hugs, be aware that the implementation of Random.split was hacked up by someone who has no way of justifying that implementation, beyond some simple experiments. - If you know something about the mechanics of random number generators, here's an area where Haskell specifications and implementations could benefit from your knowledge! All the best, Mark [end quote]

Please register or sign in to comment