Skip to content

IntMap's behind UniqFM become unbalanced

This has been mentioned in several issues:

Edit: Much later: We could never prove that this effect is measurable. Meanwhile we found a different hypothesis, see #24137.

Rather than continue with these comments, this issue serves to track the unbalanced intmap work.

The problem

IntMap's are big-endian patricia tries. The idea is that these trees have good performance for operations with sequential Int keys because as the tree grows the branching bit does not flip on every insert. In the worst case these trees become unbalanced when keys force a branch on each insert. See #19820 (comment 351497) for an example. Once the tree is unbalanced any insertions that share the bit prefix of the last insertion becomes costly because the spine of the tree must be rebuilt and is close to max depth (64 elements on a 64-bit machine).

Evidence that the problem exists

Direct evidence of the tree degeneration is rather hard to get. However we have some indicators that this is happening. First, our scheme for generating Unique's is likely susceptible to this behavior since we take a 64-bit Int, remove the top 8 bits and replace them with a character mask in GHC.Types.Unique this means that if we get into an unbalanced case and insert a large segment of Unique's of the same class (where the character mask and thus the bit prefix is identical) then we'll get into the worst case. See #19820 (comment 368588).

Other indications occur in #19414 (comment 364337) where it was observed that the -dunique-increment flag changes the allocations of insert in ticky profiles of T12545. Which is behavior we would expect if the IntMap's were degenerate and that changed with the flag (although I'm unsure exactly what this flag does).

Merge Requests

Edited by Sebastian Graf
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information