Loosen conditions for top-level unlifted bindings
In !10841 I'm introducing the capability to float unlifted bindings to the top level. However, I'm only allowing it under quite restrictive conditions which I've explained in this note (pay particular attention to the third bullet point in the list at the bottom):
Note [Core top-level unlifted data-con values]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As another exception to the usual rule that top-level binders must be lifted,
we allow binding unlifted data constructor values at the top level. This allows
us to store these values directly as data in the binary that we produce, instead
of allocating them potentially many times if they're inside a tight loop.
However, we have to be very careful that we only allow data constructors that
are really values.
* We only consider data constructor workers and not wrappers, because wrappers
are generally not fully evaluated. See Note [The need for a wrapper].
* Even data constructor workers might still be expanded by the STG rewriter to
perform some work, if they have arguments that are marked strict.
See Note [Data-con worker strictness].
* If the data constructor has unlifted arguments, then those could cause further
evaluation to be necessary, unless they are fully evaluated data constructor
values themselves.
Furthermore, there is another complication. The data constructor worker may be
applied to a variable which is defined in another module, or worse, in an
hs-boot file. So, we cannot always get all the information we need and even for
variables defined in the same module it might still be hard or computationally
expensive to collect the necessary information.
So, for the first incarnation of this feature we choose very restrictive
conditions, which are still useful in practice. We allow top-level unlifted
data constructor workers if they are applied to arguments that are one of:
* A literal. Literals are guaranteed to be fully evaluated.
* A coercion. These are always fully evaluated and even removed when compiling
to STG.
* Any expressions of lifted type, but only if that argument is not
marked strict.
* An unlifted variable. These are top-level themselves and thus fully evaluated.
In the future, we hope to relax this condition (#23811).
The third bullet point in the list on the bottom is more restrictive than necessary. If a lifted argument of a data constructor is marked strict then that might still be fine if that lifted argument is itself fully evaluated, i.e. it is itself a data constructor worker applied to literals, coercions, etc.
An example of such a case is in the TopLevelMixBangs test case:
{-# OPTIONS_GHC -ddump-simpl -ddump-simpl -dsuppress-all -dno-typeable-binds -dsuppress-uniques #-}
{-# LANGUAGE UnliftedDatatypes #-}
module TopLevelMixBangs where
import GHC.Exts (UnliftedType)
import Data.Kind (Type)
type UNat :: UnliftedType
data UNat = UZero | USucc !LNat
data LNat = LZero | LSucc UNat
type Box :: UnliftedType -> Type
data Box a = Box a
x = Box (USucc xa)
xa = LSucc (USucc xb)
xb = LSucc (USucc xc)
xc = LSucc (USucc xd)
xd = LZero
Here we could float USucc xa
to the top level, because although xa
is a lifted argument which is marked strict, xa
is itself a fully evaluated application of the LSucc
data constructor.
The primary reason I opted for the restrictive condition is the difficulty of the implementation. As the TopLevelMixBangs example shows, it might be necessary to check multiple top-level definitions transitively to be able to conclude if a data constructor application should be allowed to float to the top level. This causes two main problems in the implementation:
- We need access to properties of other bindings. We don't have that kind of access currently in
exprIsTopLevelBindable
where the core of the logic is. - This is potentially inefficient if we don't cache the results for every top-level binding.
But there is a promising solution to these problems. Each variable carries with it a reference to its unfoldings. We could inspect these unfoldings to check if a variable is fully evaluated. So we do have access to that in exprIsTopLevelBindable
. And there is an unfolding cache which already has a field called uf_is_value
, which seems ideal for our purposes.
However, that cached uf_is_value
uses the result of exprIsHNF
which unfortunately currently ignores the problem that the STG writer may still re-introduce some evaluation (See #20749 (closed) and Note [Data-con worker strictness]
). Once !9874 (closed) lands to fix that, we should strongly consider using exprIsHNF
as it would allow more unlifted bindings to be floated. In particular we could replace the whole condition for floating with:
We allow top-level unlifted data constructor workers if they are applied to
fully evaluated arguments as defined by `exprIsHNF`.