Optimize record updates which represent identity.

Semi-regularly I come across core with this kind of pattern.

          GHC.Types.Var.Id ds_s9pv [Occ=Once1]
                           bx_s9pw [Occ=Once1]
                           ds1_s9px [Occ=Once1]
                           ds2_s9py [Occ=Once1]
                           _ [Occ=Dead]
                           ds4_s9pA [Occ=Once1]
                           ds5_s9pB [Occ=Once1] ->
              GHC.Types.Var.Id [ds_s9pv
                                bx_s9pw
                                ds1_s9px
                                ds2_s9py
                                GHC.Types.Var.GlobalId
                                ds4_s9pA
                                ds5_s9pB];

Which is the result of a pattern like this:

Variant 1:

data Record = MkRecord { f1 :: T1 , f2 :: T2 , .. fn :: Tn }

data T1 = ... | T1_NullaryCon | ...

set_f1_to_x arg = arg { f1 = T1_NullaryCon }

But this is bad! It reallocates the whole constructor even if the field already contains the right constructor. If this is reasonably likely a user might have written instead:

Variant 2:

set_f1_to_x arg
  | f1 arg == T1_NullaryCon = arg
  | otherwise = arg { f1 = T1_NullaryCon }

But that is not optimal either. Because it (potentially) forces the old value in the field. Even if that is already in whnf the code for entering f1 arg still gets emitted.

But what if instead we had the compiler rewrite the original pattern into this pseudo code:

Variant 3:

set_f1_to_x arg{ f1 = f1 } =
  | get_tag_without_eval f1 == dataToTag T1_NullaryCon = arg
  | otherwise = arg { f1 = T1_NullaryCon }

We can't currently write get_tag_without_eval in source level haskell (maybe it should exist?). But the idea is that we look at the pointer inside of the field, compare it's tag to the nullary constructor we replace it with, and if it matches we return the original argument instead of allocating a new one.

The overhead is just a comparison on the tag and a jump. But we potentially save dozens of memory accesses and a bunch of allocation. So even if it sometimes pays off it would be worthwhile.

Some points to consider:

There is no big hit because we look at the contents of the field. We need to load the content of the field in order to allocate the new constructor anyway.
Looking at a field "in the middle" of the constructor first is slightly worse for the prefetcher as it's a less predictable pattern.
For constructors who's tag is stored in the info table this might not be a win, because in that case we would add an additional memory load.
It might be hard to see for the compiler if the user already did this check himself. In which case we would only add pointless overhead.
In some cases it will simply never be the case that the field already has this value so there it would also be redundant overhead.
Users can't currently write the optimal version in Variant 3 by hand because get_tag_without_eval doesn't exist. We could add it but I imagine it would be a bit brittle. Because get_tag_without_eval (f1 arg) would just return the tag of the application thunk. And it's not unimaginable that the compiler might transform code into a form where we apply a application like that instead of the bound field directly.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information