Clarify and document the CBV-convention

I'd like to clarify the role of

CbvMarks
OtherCon [] unfoldings
Strictness signatures
Strict data constructors

for a strict function. Relevant tickets

#21472 (closed), !8148 (closed)
#14626 (closed)
#14677 (closed)
#15155 (closed)
#21496 (closed)
#14626 (closed): this relates to wether a top-level constructor is exported properly-tagged.
#15696 (closed): how untagged pointers can end up in strict constructor fields
#16559 (closed)
#14677 (closed)
#20749
#22475 (closed)

The olden days

Until a few months ago, even if a function was strict, it could not guarantee that the caller would pass it an evaluated argument, or that (even if it did) that value would be properly tagged.

For example, reverse is strict in its argument, but we might have a call

f A xs = []
f B xs = reverse xs
f C xs = reverse (take 2 xs)

Now f is not strict in xs (see the A case), and our long-standing convention is that f can just pass xs on to reverse in the B case. But in the C case we will use call-by-value to avoid building a thunk, thus

f C xs = case take 2 xs of xs' -> reverse xs'

Bottom line: reverse cannot guarantee to have an evaluated argument. It might be passed a thunk.

Passing properly tagged pointers

See

Note [Tag Inference] in GHC.Stg.InferTags
Note [How untagged pointers can end up in strict fields] in GHC.Stg.InferTags, and #15696 (closed)
Note [Strict Worker Ids] in GHC.Types.Id.Info
Note [Attaching CBV Marks to ids] in GHC.Core.Tidy

In the work on tag inference (pointers above), we allow a function to establish a new calling convention: that it must be given a properly tagged pointer (to a value) as its argument. Let's call this the CBV-convention for a particular argument. Notice that

A function is marked strict in an argument if the function itself is sure to evaluate the argument: strictness says something about the function's semantics.
A function that uses a CBV-convention for an argument requires that argument to be passed as a properly-tagged value: it is a requirement on the caller.

Strict data constructors (or more precisely the worker for a strict data constructor) have, and must have, a CBV convention for their strict arguments.

It is possible for a strict data constructor not to have a properly-tagged argument; see Note [How untagged pointers can end up in strict fields] in GHC.Stg.InferTags, and #15696 (closed). To establish the invariant that struid data constructors always have properly-tagged arguments, the code generator adds extra evals around a strict data constructor, when necessary. This is done in GHC.Stg.InferTags.Rewrite.rewriteTopBinds.

`OtherCon []` unfoldings

What does a OtherCon[] unfolding mean? It means at every use site of the binder, the binder will represent a value in WHNF.

This means it's very tempting to give workers a OtherCon unfolding for lambda binders which represent strict fields. Since it seems reasonable to assume the field has been evaluated before constructor allocation. This would allow a seq on that argument to be dropped. This is an attractive idea -- after all, the caller should be guaranteeing that it is called by value! But it proves to be unsustainable, as we will see.

However that seq might be the only thing that makes that function strict. e.g.

f x = x `seq` Just x

If we decide to give f a CBV-convention, we might transform to

f x{-Unf=OtherCon []-} = x `seq` Just x

and then drop the seq, giving

f x{-Unf=OtherCon []-} = Just x

That is fine provide all callers actually use call-by-value. But #21496 (closed) highlights that this is a very problematic assumption and #21460 established that this indeed can go wrong in practice!

So we decided to remove OtherCon from all lambda binders.

This means the OtherCon invariant than rather becomes:

For a binder with an OtherCon unfolding at every use site the binder will represent a value in WHNF. We may not give lambda binders a OtherCon unfolding because it's impractical to uphold this invariant across function calls. See the proposals below for details.

Why we need CbvMarks at all?

Why do we need CbvMarks on any functions at all? Here's an example:

data T = MkT ![Int] Bool

f (MkT xs b) = MkT xs <big-expression>

Just before demand analysis we have this:

f t = case t of MkT xs b -> $WMkT xs <big>

(Remember, we have not yet inlined data constructor wrappers.) Then demand analysis does w/w:

$wf xs b = $WMkT xs <big>

f t = case t of MkT xs b -> $wf xs b

Now we inline that $WMkT:

$wf xs b = case xs of xs' -> MkT xs <big>

Now, alas, we have no way to eliminate that case xs, even though that argument of $wf came straight out of the incoming MkT. Sad. And this really happens in code that takes apart and re-assembles data structures with strict fields.

So we want to make $wf into a CBV-convention function (like the worker MkT itself), so that

Inside $wf we know that we'll be passed a non-bottom, properly tagged value, so we can make the case xs a no-op at runtime.
The callers of $wf are responsible for guaranteeing that claim.

Something similar happens in SpecConstr. Suppose we have

foo :: T -> blah
foo t = rhs

baz = ...foo (MkT p q)...

Now SpecConstr specialise foo, thus

$sfoo xs b = let t = MkT xs b in rhs

RULE forall xs b. foo (MkT xs b) = $sfoo xs b

Here again, the xs argument to $sfoo comes straight from a MkT, and we don't want to do extra evals on it in the body of $sfoo. (I can't quite cook up a specific example here, but I'd like to see one. Note [SpecConstr and evaluated unfoldings] doesn't give an actual example.)

CbvMarks and StrictWorkerIds

Currently the notion of a CBV-parameter is very squishy. Note [Attaching CBV Marks to ids] in GHC.Core.Tidy says:

Before tidy, function arguments which have a call-by-value semantics are identified by having an OtherCon[] unfolding.
During tidy, we transform this information into CBV (call-by-value) marks. The marks themselves then are put onto the function id itself.

So we have two quite different ways to signal CBV-parameters.

When the simplifier inlines a call to a function with a CBV-parameter (either imported, in which case it'll be signalled in the CbvMarks, or local, in which case it'll be signalled in the unfolding), it must be careful not to lose the "eval". I'm far from certain that we are careful.

In some way this can lead to a crash: see #21472 (closed). Sebastian gives a reproducer here.

Putting OtherCon unfoldings on a lambda binder is extremely flaky, even if the call site is passing something that comes directly from a strict constructor field. Suppose

data T = MkT ![Int] Bool

f :: T -> T
f (MkT x b) = MkT x True

So (assuming we do a w/w split) we will end up with

$wf x* = $WMkT x True
f t = case t of MkT x _ -> $wf x

where x* means "has an OtherCon [] unfolding, and $WMkT is the wrapper for MkT that does the eval. (I'm assuming here that strict constructors have wrappers, as today.)

Now we inline $WMkT and optimise, dropping the evals (because of that flaky OtherCon []:

$wf x* = MkT x True

Now consider this call to f:

   f ($WMkT ex eb)
   {inline $WMkT}   -->  case ex of x ->  f (MkT x eb)
   {inline f}       -->  case ex of x ->  case MkT x eb of MkT x _ -> $wf x
   {case of known}  -->  case ex of x ->  $wf x
   {eval-of-strict} -->  $wf ex
   {inline $wf}     -->  MkT ex True

We have lost the eval. It will get put back the the code generator, but all our faffing with OtherCon [] bought us nothing.

Proposals

So much for the backgound. Here is our proposed new design. NB: parts of it are the same as the current design, but we re-state it here (signalling that it's the status quo) for completeness.

Strict data constructors: no change

We propose to make no change in the way that strict data constructors are handled. Consider this example:

  data T = MkT ![Int] Bool

Currently we do this:

Make a wrapper
```
$WMkT x y = case x of x' -> MkT x' y
```
MkT's worker (the data constructor itself) is lazy, not strict, in all its arguments. The evals are done by the wrapper; the worker is passive. This is important; if we don't do this, the simplifier will drop the case x in $WMkT because x' is used strictly by MkT.
GHC.CoreToStg.Prep can use eta-expansion to ensure that all strict data constructor applications are either saturated, or applied to no arguments at all (maybeSaturate). (Side note: Note [CorePrep Overview] point (1) claims that Prep saturates all data constructor and primop applications, but that isn't true.)
GHC.CoreToStg.Prep also adds a binding
```
MkT = \x y. MkT x y
```
so that the "no arguments at all" case has something to call (see mkDataConWorkers).
In the code generator (or, rather, late STG), for every call to the worker MkT, we wrap an eval for the first argument if we can't statically see that the first argument is evaluated and properly tagged. This is done by GHC.Stg.InferTags.Rewrite.rewriteTopBinds.

While there is no change here, it does mean that we have a somewhat-subtle Core invariant that we should document (#20749):

In every Core program, it should be the case that replacing every constructor application (MkT xs b) with case xs of xs' -> MkT xs' b does not change the meaning of the program; that is, xs is non-bottom. (In general, add a case for each strict field of the constructor.)

We establish the invariant by only introducing MkT via a call to its wrapper $WMkT, which evaluates the argument. That ensures that the strict field is non-bottom; and all further optimisation should maintain that semantics.

Although the invariant is easy to establish, it is (sadly) not possible for Lint to check it. In principle, a buggy Core-to-Core pass could break the invariant, by introducing a call to the worker MkT that does not satisfy the invariant. That is a shortcoming of the plan described here.

An attractive-seeming alternative would be to make the constructor worker strict in its strict argument(s). It's attractive because it might seem that the evals in the wrapper are mutually redundant with the evals added by the code generator. But this approach had serious problems; see "Failed idea: no wrappers for strict data constructors" below.

Proposal 0: `OtherCon []` unfoldings

Proposal: Never put an OtherCon [] unfolding on a binder. Indeed only let-binders have an Unfolding; no other binders have an unfolding at all. Rather OtherCon [] is added by the simplifier to occurrences, when in a scope where that occurrence is konwn to be evaluated. Example:

data TD = A | B | C
case x of y
  { A -> ...(case y of { A -> e1; _ -> e2 })...
  ; _ -> ...(y `seq` blah)...

In the A branch, y's occurrences have an unfolding of A; that allows us to simplify the inner case y.
In the _ branch, y's occurrences have an unfoldin of OtherCon [A] saying that y is not A; that allows us to drop the seq.

We use this for strict data constructors too:

data T = MkT ![Int] | A | B | C
foo x = case x of y
          MkT p -> rhs1
          _     -> rhs2

Here,

In the _ branch, y gets an OtherCon [MkT] unfolding.
In the MkT branch, y gets an unfolding of MkT p; and p gets an OtherCon [] unfolding.

But no binder has an OtherCon [] unfolding; they are purely for occurrences.

Proposal 1: CBV arguments

Completely get rid of StrictWorkerId and CbvMarks in the simplifier.
In the code generator, establish the following calling convention: for "CBV-Ids", every strict, lifted argument is passed as an evaluated, properly-tagged pointer. See below for what a "CBV-Id" is.
Whatever we do, we must ensure that the decisions we make about top-level functions are exposed in the interface file, regardless of optimisation level. If a function uses CBV, the caller must honour that calling convention. It's not optional! See #21676
GHC.Stg.InferTags.Rewrite.rewriteTopBinds ensures that all calls meet this calling convention. Suppose f is a CBV-Id, with arity 2 and CbvMarks [MarkedCbv, NotMarkedCbv]. Then the rewriter
- Adds an eval to a saturated call (f x y), thus (case x of x2 -> f x2 y) unless the rewriter can prove that x is properly tagged.
- Eta-expand unsaturated calls to f; thus f becomes (\xy. case x of x2 -> f x2 y).
- Calls of non-CBV-Ids are unaffected, as are CBV-Ids that have no strict boxed arguments.
When generating code for
- case x of { p1 -> e1 ... }: no need to eval if x is known to be evaluated and properly tagged
- case x of y -> rhs: do nothing at all if x is known to be evaluated and properly tagged
I am no longer sure of the role of the tag inference pass. (To discuss.)
What Ids should be "CBV-Ids"?. NB: this is a free choice. Everything will work correctly if we say "all" or "none" or anything in between.

One possiblility is "all Ids are CBV-Ids". That is simple. But Andreas (in the dim and distant past) tried (something like) this and found that he got a lot of code bloat from eta-expansion of CBV-Ids.

So instead we propose these Ids as CBV-Ids:
- Constructor workers: this one is actually vital to maintain the invariant that the argument of a strict constructor worker is properly tagged.
- Join points; these are always saturated, so no eta-expansion worries
- Worker-like Ids: workers from worker/wrappr, specialised functions from SpecConstr. Again always saturated --- at least when they are born.

And that's about it. It's simple, catches all the cases that we are currently worried about.

The second bullet of "when generating code" effectively drops redundant cases altogether; but we only do it at this late stage, not driven by flaky OtherCon [] unfoldings.

Proposal 2: absent fillers for strict fields

Now that we have removed OtherCon from lambda binders (see "Proposal: CBV arguments" above), we risk producing code which fails with absentError panics at times. This is because W/W currently uses OtherCon as a way to determine that a binder represents a strict field. See eee498bf and Note [Absent fillers] 2):

Obviously this approach is no longer possible after OtherCon is no longer present on these binders! Here is the proposal: In worker/wrapper, use non-bottom absent fillers for absent, banged fields.

Despite the current note in W/W saying this is really nasty it turned out to be not hard at all. Taking the code from Note [Absent fillers] this means for this code:

  data T = MkT [Int] [Int] ![Int]  -- NB: last field is strict
  f :: T -> Int# -> blah
  f ps w = case ps of MkT xs ys zs -> <body mentioning xs>

Then f gets a strictness sig of <S(L,A,A)>. But we thread through the bangs of the fields so we make a worker $wf

-- RHS size: {terms: 25, types: 21, coercions: 0, joins: 0/2}
$wf_s14m [InlPrag=NOINLINE] :: [Int] -> Int#
[LclId[StrictWorker([])], Arity=1, Str=<SL>]
$wf_s14m
  = \ (ww_s14f [Dmd=SL] :: [Int]) ->
      case let {
             ps_awV [Dmd=S!P(SL,A,A)] :: T
             [LclId]
             ps_awV
               = M.MkT ww_s14f
                       (GHC.Prim.Panic.absentError @[Int] ...)
                       (RUBBISH(LiftedRep) @[Int])
               } in
           let {
             w_awW [Occ=Dead, Dmd=A] :: Int#
             [LclId]
             w_awW = RUBBISH('IntRep) @Int# } in
           case ps_awV of
           { MkT xs_awX [Dmd=SL] ys_awY [Dmd=A] zs_awZ [Dmd=A] ->
           case GHC.List.$wlenAcc @Int xs_awX 0# of ww1_a14b { __DEFAULT ->
           GHC.Types.I# ww1_a14b
           }
           }
      of ww_s14k
      { __DEFAULT ->
      case ww_s14k of wild_00 { I# ww_s14n -> ww_s14n }
      }

The trick here is that when we make the filler for zs we check if there was a bang on this field and if so we create RUBBISH(LiftedRep) @[Int] instead of an absentError like for the lazy field ys

This means we can still drop the original zs early potentially freeing memory retained by it and we don't have to pass it as an argument either. The downside of this approach is if absence analysis is actually wrong then zs won't fall over with a panic and instead we get segfaults/incorrect results.

I will implement this as part of !8148 (closed)

Proposal 3a: add `seq`s in the worker

Consider

f :: T -> Maybe T
f t = case t of MkT xs b -> Just (MkT xs (g b))

Strictness analysis will worker/wrapper to

$wf xs b = let t = MkT xs b in case t of MkT xs b -> Just (MkT xs (g b))
f t = case t of MkT xs b -> $wf xs b

We optimise $wf to

$wf xs b = Just (MkT xs <blah>)

but alas now

$wf is no longer strict in xs (remember MkT is not strict; and in any case the Just makes it lazy).
The code generator will add an eval for xs around the call to MkT.

Proposal: during w/w, add an eval, in the worker, for used arguments of strict data constructors:

$wf xs b = case xs of xs' ->
           let t = MkT xs' b in
           case t of MkT xs b -> Just (MkT xs <blah>)

Now after optimising we get

$wf xs b = case xs of xs' -> Just (MkT xs' <blah>)

Now $wf is visibly strict in xs, so:

the code generator will not add an eval around MkT, since there is an enclosing eval
the code generator will make $wf into a CBV-Id; and hence will drop the case xs in $wf because the caller will guarantee it is evaluated. See "Proposal: CBV arguments".

Proposal 3b: SpecConstr: `seq`s in the specialised function

We should add extra evals for strict arguments in SpecConstr, just as in worker/wrapper.

f t = ....(case t of MkT xs b -> blah)....

Suppose SpecConstr decides to make the following RULE

RULE forall xs b. f (MkT xs b) = $sf xs b

Then the specialised function should have the extra evals:

$sf xs b = case xs of xs' ->
           let t = MkT xs b in
           ...(case t of MkT xs b -> blah)....

(This is sound becuase in the call f (MkT xs b) we know that xs is non-bottom.) This $sf optimises to

$sf xs b = case xs of xs ->  ...(blah)....

and again any uses of xs in blah can see that xs is evaluated. Again, $sf is strict so (see "Proposal: CBV arguments") we'll use a CBV convention, and end up dropping the case.

Proposal 3c: `seq`s in join points arising from case alternatives

Eaxctly the same deal for join points: add extra evals for strict arguments. Suppose we have

  case e of MkT xs y -> rhs

and we want to make a join point:

  join $j xs y = case xs of xs -> rhs in
  case e of MkT xs y -> $j xs y

Failed ideas

Failed idea: no wrappers for strict data constructors

data T = MkT ![Int]

At one point I proposed, for

Not have a wrapper $WMkT at all
Mark the worker MkT as strict in its first argument. More precisely, in demand analysis change dmdTransformDataConSig to add a 1A demand on strict fields.

Having a wrapper to add evals seems redundant, becuause the code generator will add them anyway. But it isn't! Suppose we do the case-of-known-constructor transformation:

    case MkT x of MkT p -> True  ==>   True

If MkT is strict in x, then the LHS is strict in x; but the RHS is not. We'd have to change the transformation to add evals. Similarly case-to-let

   case MkT x of r -> Just r  ==>  let r = MkT x in Just r

Again, the LHS is strict in x but the RHS is not.

There is similar knock-on effect on exprIsHNF, exprOkForSpeculation etc.

In the end, the no-wrapper simplicity seems well outweighed by the extre complexity in multiple other transformations. The fact that constructors are lazy is rather deeply built into GHC!

Alternative `absentError` Proposal: Give constructor workers always at least a used demand on their strict fields

Taking the code from Note [Absent fillers] this means for this code:

  data T = MkT [Int] [Int] ![Int]  -- NB: last field is strict
  f :: T -> Int# -> blah
  f ps w = case ps of MkT xs ys zs -> <body mentioning xs>

Then f gets a strictness sig of <S(L,A,L)>.

Since the strict field will no longer be absent we don't generate a filler at all and instead just use the actual field content. This means we will produce a worker like this:

-- RHS size: {terms: 25, types: 21, coercions: 0, joins: 0/2}
$wf_s14m [InlPrag=NOINLINE] :: [Int] -> [Int] -> Int#
[LclId[StrictWorker([])], Arity=2, Str=<SL><L>]
$wf_s14m
  = \ (ww_s14f [Dmd=SL] :: [Int])
      (ww_zs   [Dmd=L]  :: [Int]) ->
      case let {
             ps_awV [Dmd=S!P(SL,A,A)] :: T
             [LclId]
             ps_awV
               = M.MkT ww_s14f
                       (GHC.Prim.Panic.absentError ...)
                       ww_zs
               } in
           let {
             w_awW [Occ=Dead, Dmd=A] :: Int#
             [LclId]
             w_awW = RUBBISH('IntRep) @Int# } in
           case ps_awV of
           { MkT xs_awX [Dmd=SL] ys_awY [Dmd=A] zs_awZ [Dmd=A] ->
           case GHC.List.$wlenAcc @Int xs_awX 0# of ww1_a14b { __DEFAULT ->
           GHC.Types.I# ww1_a14b
           }
           }
      of ww_s14k
      { __DEFAULT ->
      case ww_s14k of wild_00 { I# ww_s14n -> ww_s14n }
      }

After some reflection I think this isn't a great idea because:

We will retain zs longer.
We pass more arguments to the worker for no good reason.
It makes the strictness signature less precise.

The upside of this option is that if absence analysis is wrong we are more likely to actually get an absentError instead of a segfault.

If we really want to ensure we get an absentError in case absence error on a strict field goes wrong then we can implement the alternative absentError proposal in the future.

Edited Jan 06, 2023 by Sebastian Graf

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information