Performance curiosities in genSym for Uniques
EDIT: See also #20619 (comment 389396), where I float other possible problems in this space.
FURTHER EDIT: The main point in this post seems addressed -- see #20619 (comment 389914). But the "other possible problems" link immediately above is still active.
@ryantrinkle pointed out to me yesterday that our approach toward UniqueSupply
s relies on a single counter, shared across threads. Could contention here be slowing us down?
Here are reasons why it might not be:
- An atomic update is fairly fast, so the counter is owned only for short stretches of time.
- We don't actually allocate many many uniques in short succession.
- GHC is not, in practice, run with many threads often.
It does seem possible to do better here: we could reserve n bits of uniques to correspond to a thread identifier (perhaps n is chosen to be the log of the number of cores on the processor) and then have 2^n separate counters. There would no longer be any shared state among threads. If the thread bits are on the more significant end, then we could still just increment the counter by 1 each time. The only new step would be using the current thread id to look up which counter to use. (I don't know how expensive that would be.)
Feel free to say this is ridiculous, but it came up in a conversation around why Haskell doesn't just crush other languages in parallelism, and we started wondering why our own flagship tools don't do a better job of parallelism.