IntMap's behind UniqFM become unbalanced
This has been mentioned in several issues:
Edit: Much later: We could never prove that this effect is measurable. Meanwhile we found a different hypothesis, see #24137.
Rather than continue with these comments, this issue serves to track the unbalanced intmap work.
The problem
IntMap
's are big-endian patricia tries. The idea is that these trees have good performance for operations with sequential Int
keys because as the tree grows the branching bit does not flip on every insert. In the worst case these trees become unbalanced when keys force a branch on each insert
. See #19820 (comment 351497) for an example. Once the tree is unbalanced any insertions that share the bit prefix of the last insertion becomes costly because the spine of the tree must be rebuilt and is close to max depth (64 elements on a 64-bit machine).
Evidence that the problem exists
Direct evidence of the tree degeneration is rather hard to get. However we have some indicators that this is happening. First, our scheme for generating Unique
's is likely susceptible to this behavior since we take a 64-bit Int, remove the top 8 bits and replace them with a character mask in GHC.Types.Unique
this means that if we get into an unbalanced case and insert a large segment of Unique
's of the same class (where the character mask and thus the bit prefix is identical) then we'll get into the worst case. See #19820 (comment 368588).
Other indications occur in #19414 (comment 364337) where it was observed that the -dunique-increment
flag changes the allocations of insert
in ticky profiles of T12545
. Which is behavior we would expect if the IntMap
's were degenerate and that changed with the flag (although I'm unsure exactly what this flag does).
Merge Requests
-
@sgraf812 began work to hash
UniqFM
keys in !5068 (closed) however this branch had a bug which was fixed in !6340 (closed)