Chunkify large records similar to large tuples
Motivation
During a conversation at ZuriHac, the problem of inefficiency when using large records came up (for a good write up of this problem, see this blog post from Well Typed). Towards the end of the blog post some concerns / ideas are raised:
- The solution in the blog post involves using a library, it is not a solution baked into GHC
- It may be possible to alleviate the problem with more sharing (see #20264)
- Record pattern matches are still linear in the size of the record
Proposal
Thinking out loud I asked Simon whether we could do something similar to the treatment of large tuples, and have GHC keep a representation internally that was more nested.1 Simon enthusiastically raised some points, which I will attempt to faithfully relay here:
- We have machinery in the compiler for specifying a more efficient internal representation for data,
mkDataConRep
etc. - We already go the other way in GHC and expand records (e.g. using
{-# UNPACK #-}
on a strict tuple field) - Using a chunked representation means selector functions become a case tree
- Record updates can share chunks of a record which are not changed
To give some examples (in deliberately invalid Haskell): a large record
{ x0 :: A0, ..., xn :: An }
would be internally represented as something similar to
{ { x0 :: A0, ..., xK-1 :: AK-1 }, { xK :: AK, ... }, { ..., xn :: An } }
where each chunk has K number of fields. The selector functions change accordingly
sel_xn_old r =
case r of
C0 _ ... xn -> xn -- Pattern linear to number of fields
sel_xn_new r =
case r of
C1 _ c1 -> -- Each item here is just a chunk
case c1 of
C2 _ ... xn -> xn -- Match on chunks until we find the element we want
Thinking about this the past week or so I have some more questions about this:
- How would
{-# UNPACK #-}
play nicely with a chunked record representation? We do not have to think about this for tuples because it is not valid to write something like(Int, {-# UNPACK #-} !Int)
in Haskell. Intuitively, I would guess that such strict unpacked fields would be floated to the top of the record alongside the chunks. - Would we generate new types for the chunks, or use tuples internally to represent the chunks. I feel like tuples would be simpler, but perhaps there is some reason to not favour this?
@simonpj As promised, I have written this up as an issue. Hopefully this is clear enough starting point for people to discuss.
-
Due to a limit set by the format of info tables, GHC only allows tuples of up to 64 elements internally. However, when writing code in the surface syntax of Haskell, we can use tuples larger than 64. Internally, GHC splits the tuple into a tuple of smaller tuples which can contain all the elements. This is transparent to the user writing the code.
↩