Intrusive unpacked sums
It looks like #15358 (closed) will soon be implemented (yay!), but there is something more we can do, which after @danbornside I'll call "intrusive unpacked sums". The idea is that if there was an unboxed sum that is the old field of a variant, to "join" it's tag with the outer data types's tag.
Here's an example. Suppose we explicitly choose the tags of Either like so:
data Either a b
= {-# TAG 0 #-} Left a
| {-# TAG 1 #-} Right b
Then we write our own data type like this:
data MyEither3 a b c
= {-# TAG 2 #-} ME_2 a
| ME_E {-# UNPACK_ALL_THE_WAY #-} (Either b c) -- tag: nope
GHC sees that the user-assigned tags are disjoint, and so picks a representation like this:
data MyEither3_Rep a b c
= {-# TAG 2 #-} ME_0 a -- tag: 2
| {-# TAG 0 #-} ME_1 b -- tag: 0
| {-# TAG 1 #-} ME_2 c -- tag: 1
Importantly, taking a reference too the inner Either shouldn't need to reallocate the field, because the MyEither3 info table is "good enough"
Now, this is a rather hamfisted optimization, with all these human-assigned tags, and resulting ecosystem friction of trying to asynchronously assign tags while not breaking any UNPACK_ALL_THE_WAY annotations. But I think it's at least useful to think about as a thought experiment:
-
There is absolutely no run time penalty to this, unless I've missed something. it's an run-time free lunch with the cost imposed statically (managing tags).
-
I think it's one of the few ways to get us
NonEmptyIntMapandNonEmptyfor free. The problem is these fancier data structures have multiple variants in the non empty case, an additional one in the empty case. Embedding the non-empty case in the empty-case such that the injection is for free is the only way to preserve today's cost model. See https://mail.haskell.org/pipermail/libraries/2019-April/029564.html, for an alternative solution with GADTs, but I suspect dealing with the existential variable in the empty case that I suspect will cause problems until #1965 is fixed, and is also quite a fancy solution. CC @treeowl -
Fancier TTG-esque stuff with ASTs also can incur runtime penalities, which discourages modularity in compiler. As someone who is very much a fan of "open recursive"-style highly polymorphic ASTs, I really want those to be able to compete with fully-baked non-compositional stuff without penalty.
I don't love this mechanism for the reasons given above, but I think it and it's benefits are worth discussion, if only to better elucidate the full gamut of the problem and solution space.