Rethink dataToTag# in light of tag inference.
Simon in #17240 (comment 414000) suggested some principally good changes to how we deal with dataToTag# which I will reproduce in full below
Background
-
dataToTag#
(a primop) currently evaluates its argument, via some magic in the code generator -- seeNote [dataToTag# magic]
in ConstantFold.hs - As you point out, this makes the evaluated-ness of the argument invisible to the subsequent code; an explicit
case
expression would be better. - We moved from an explicit case expression to making the eval part of
dataToTag#
during the long saga of #15696 (closed). - That saga has lots of resonances from the "every strict data constructor field has a properly tagged pointer" debate; the canonical reference is https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/haskell-execution/pointer-tagging, which (alas) has not been updated following your tag-inference pass landing -- could you fix that? esp the "strict fields containing..." section.) #15696 (closed) even has the same containers issue; see here.
Opportunity
- Mark
dataToTag#
as having aCbvMark
on its argument. - Add a
case
expression inGHC.Base.getTag
to do the evaluation. - Now the evals will be visibile in the equality code; but the machinery you have recently added will ensure that every
dataToTag#
call really will get a properly tagged argument to look at. - Remove the eval magic for
dataToTag#
in the code generator
Result: simpler compiler, better code for comparisons. By the latter I mean that we can return to this idea without it being defeated by your observation that "the eval dataToTag# does is kinda hidden in it's implementation".
Does that make sense? Another payoff for the work you did on CbvMarks
.
We should do this! However in order to get there we need to iron out a few wrinkles:
- Currently we don't use getTag in the deriving code (despite what it's docs say). But that's easy to change.
- In
case a of a' { _ -> dataToTag# a' }
the simplifier would remove the "redundant" case on a. - If we want to mark
dataToTag#
as call by value and don't have it evaluate its argument itself we have to either:- Run tag inference before bytecode gen: #21083 (closed)
- Split dataToTag# into a worker (with the call-by-value argument) and a wrapper. Unoptimized code would call the wrapper, while optimized code would call the wrapper (and would run tagInference). The W/W approach seems very reasonable and would only result in minimal compile time overhead for code currently using dataToTag# since the wrapper would be inlined.
- When tag inference ends up evaling a variable to ensure it's tagged it doesn't try to replace further occurences of the evaluated variable with the new (tagged) binder. It's simple to add this logic and iirc there is already TODO about it. I didn't include it initially because tag inference runs always when compiling (even -O0) and I wanted to avoid a potential comnpile time hit. But it shouldn't be hard to make this gated by a flag/optimization level to avoid the cost at -O0.
Overall I think this would be a good change. Maybe something to look at for 9.6