Optimize record updates which represent identity.
Semi-regularly I come across core with this kind of pattern.
GHC.Types.Var.Id ds_s9pv [Occ=Once1]
bx_s9pw [Occ=Once1]
ds1_s9px [Occ=Once1]
ds2_s9py [Occ=Once1]
_ [Occ=Dead]
ds4_s9pA [Occ=Once1]
ds5_s9pB [Occ=Once1] ->
GHC.Types.Var.Id [ds_s9pv
bx_s9pw
ds1_s9px
ds2_s9py
GHC.Types.Var.GlobalId
ds4_s9pA
ds5_s9pB];
Which is the result of a pattern like this:
Variant 1:
data Record = MkRecord { f1 :: T1 , f2 :: T2 , .. fn :: Tn }
data T1 = ... | T1_NullaryCon | ...
set_f1_to_x arg = arg { f1 = T1_NullaryCon }
But this is bad! It reallocates the whole constructor even if the field already contains the right constructor. If this is reasonably likely a user might have written instead:
Variant 2:
set_f1_to_x arg
| f1 arg == T1_NullaryCon = arg
| otherwise = arg { f1 = T1_NullaryCon }
But that is not optimal either. Because it (potentially) forces the old value in the field. Even if that is already in whnf the code for entering f1 arg
still gets emitted.
But what if instead we had the compiler rewrite the original pattern into this pseudo code:
Variant 3:
set_f1_to_x arg{ f1 = f1 } =
| get_tag_without_eval f1 == dataToTag T1_NullaryCon = arg
| otherwise = arg { f1 = T1_NullaryCon }
We can't currently write get_tag_without_eval
in source level haskell (maybe it should exist?). But the idea is that we look at the pointer inside of the field, compare it's tag to the nullary constructor we replace it with, and if it matches we return the original argument instead of allocating a new one.
The overhead is just a comparison on the tag and a jump. But we potentially save dozens of memory accesses and a bunch of allocation. So even if it sometimes pays off it would be worthwhile.
Some points to consider:
- There is no big hit because we look at the contents of the field. We need to load the content of the field in order to allocate the new constructor anyway.
- Looking at a field "in the middle" of the constructor first is slightly worse for the prefetcher as it's a less predictable pattern.
- For constructors who's tag is stored in the info table this might not be a win, because in that case we would add an additional memory load.
- It might be hard to see for the compiler if the user already did this check himself. In which case we would only add pointless overhead.
- In some cases it will simply never be the case that the field already has this value so there it would also be redundant overhead.
- Users can't currently write the optimal version in Variant 3 by hand because
get_tag_without_eval
doesn't exist. We could add it but I imagine it would be a bit brittle. Becauseget_tag_without_eval (f1 arg)
would just return the tag of the application thunk. And it's not unimaginable that the compiler might transform code into a form where we apply a application like that instead of the bound field directly.