Unlifted data types
This page describes the unlifted data types, i.e. algebraic data types which live in kind #
rather than kind *. In fact, this is a collection of standalone proposals:
 Allow data types to be declared unlifted. (Should be easy; but has a harder subproposal to separate a new kind
Unlifted
from#
)  Allow newtypes over unlifted types (giving them kind
#
). (Should be easy.)  Provide a builtin
Force a
which is an unlifted version ofa
, with no indirection cost. (Harder.)
See also UnpackedSumTypes.
Motivation
Bob Harper has written:
Haskell suffers from a paucity of types. It is not possible in Haskell to define the type of natural numbers, nor the type of lists of natural numbers (or lists of anything else), nor any other inductive type!
The reason, of course, is that whenever you write data Nat = Z  S !Nat
, you define a type of strict natural numbers, AS WELL AS bottom. Ensuring that an x :: Nat
is never bottom requires all usesites of this type to do strict pattern matching / force the value appropriately. It would be nice if there was some typedirected mechanism which specified that a value x
was always evaluated. This would give benefits, e.g. for code generation, where we can assume that a pointer never points to a thunk or indirection.
It would be hard to justify drastically changing Haskell to allow defining a type which doesn't include bottom, but as it turns out, GHC already supports such nonbottom types, in the form of unlifted types of kind #
. In particular, we already have special strict evaluation rules for unlifted types like Int#
and Array#
.
The fact that a pointer never points to a thunk is especially helpful in the implementation of mutable variables: without this guarantee, code which writes to this mutable variable has to first check if the thunk is fully evaluated before actually performing the memory write. For this reason, the MutVar#
primitive (which is a GC'd object) lives in kind #
.
Proposal 1: Allow data types to be declared as unlifted
A data type can be declared as unlifted by writing data unlifted
, e.g.:
data unlifted UBool=UTrueUFalse
Such data types are always boxed, but the type does not include bottom and is operationally represented as a pointer to the value. Intuitively, if you have x :: UBool
in scope, you are guaranteed to have UTrue
or UFalse
, and not bottom.
Iceland_jack: Yakshaving, to avoid new syntax we can use
GADTSyntax
and writedataUBool::TYPEUnliftedRepwhereUFalse::UBoolUTrue::UBool
The evaluation rules for unlifted data types are identical to the existing rules we have for types kinded #
: lets are strict, cannot be recursive, and function arguments are evaluated before calls. For example:
unot::UBool>UBoolunotUTrue=UFalseunotUFalse=UTruemain::IO()main=let y = unot (error"foo")in return ()
In this example, we get the error "foo", rather than returning ()
, because the binding of y
must be evaluated strictly.
Just like other unlifted types, you cannot bind values of an unlifted data type at top level, or in a recursive group. So this is illegal
moduleMwheredata unlifted UBool=UTrueUFalse
utrue ::UBool
uTrue =UTrue Illegal!
Nonpolymorphic unlifted types can directly be unpacked. The following declaration is valid:
data unlifted StrictInt=StrictIntInt#dataMyInt=MyInt{# UNPACK #}StrictInt
and is representationally equivalent to MyInt'
here:
dataInt=IntInt#dataMyInt'=MyInt'{# UNPACK #}!Int
Of course, the constructors for MyInt
and MyInt'
have different types.
Proposal 2: Polymorphism over a new Unlifted kind
Currently, we have two different kinds (ahem) of unlifted types which live in #
: unboxed, unlifted types such as Int#
, and boxed, unlifted types such as Array#
and the unlifted data types we can now define.
This subproposal is to distinguish between these two kinds, calling boxed, unlifted kinds Unlifted
. If users start defining and using unlifted types in normal code, it is likely that they will request polymorphism over unlifted types. There are two particular cases of polymorphism we might like to support:
Polymorphism over unlifted types in types and functions. In data types and functions, we may want to be polymorphic over a type variable in kind Unlifted
:
data unlifted UList(a ::Unlifted)=UNilUCons a UListumap:: forall (a ::Unlifted)(b ::Unlifted).UList a >(a > b)>UList b
We cannot be polymorphic in #
in general, because this includes unboxed types like Int#
which don't have uniform representation. However, we can be polymorphic over unlifted types, which do have uniform representation.
Boxed levity polymorphism in types (and functions with extra code generation). In data types, we may want to have a type parameter which is polymorphic over all boxed types:
dataBList(a ::Boxed)=BNilBCons a BList
BList
is representationally the same whether or not it is instantiated with a boxed lifted type, or a boxed unlifted type.
However, for levity polymorphism over functions we must generate code twice. Consider::
map:: forall a (b ::Boxed).(a > b)>BList a >BList b
map f (BCons x xs)=BCons(f x)(map f xs)
We do not know if f x
should be evaluated strictly or lazily; it depends on whether or not b
is unlifted or lifted. This case can be handled by specializing map
for the lifted and unlifted cases. The fact that the semantics of the function change depending on the type is a bit questionable (though not entirely unprecedented, c.f. type classes); additionally, there isn't any reason why we couldn't also generate copies of the code for all unboxed types, dealing with Int#
, Float#
and Double#
: it's unclear when to stop generating copies. (For reference, .NET does this on the fly.)
Proposal 3: Allow newtypes over unlifted types
To allow costfree abstraction over unlifted types, we should allow newtypes to be written over types of kind #
, with the resulting newtype being in kind #
. For example:
newtypeMyInt#=MkInt#Int#
with MyInt# :: #
. GHC already supports coercions over kind #
, so this should be very simple to implement.
Iceland_jack: Use this to wrap
Int#
when used in boolean context inGHC.Exts
,GHC.Prim
newtypeBool#=Bool#Int#patternFalse#::Bool#patternFalse#=Bool0#patternTrue#::Bool#patternTrue#=Bool1#(==#)::Int#>Int#>Bool#(==##)::Double#>Double#>Bool#
and other similar cases.
Proposal 4: Allow unlifting existing data types with no overhead
Proposal 1 requires a user to define a new data type for every unlifted type they want to define. However, for every lifted data type a user can define, there is an obvious unlifted type one might be interested in: the one without bottom. Fortunately, we can define a data type to unlift an arbitrary lifted type:
dataForce::*>UnliftedwhereForce::!a >Force a
suspend::Force a > a
suspend a = a
Force a
is the "head strict" version of a
: if you have x :: Force Int
in scope, it is guaranteed to have already been evaluated to an Int
. We can also suspend these strict computations: suspend (error "foo" :: Int#)
does not error until forced. Like Box
, unlifted computations may not be lifted out of suspend
without changing the semantics.
Force
and suspend
can be written purely as library code, however there is a cost of an indirection of Force
. We might notice, however, that the value of type Force a
only admits the value Force a
: undefined
is excluded by the Unlifted
kind, and Force undefined
is excluded by the strict field. Thus, it would be great if we could represent Force
on the heap simply as an unlifted pointer to a
, which is never undefined.
Note from David Feuer: the indirection problem could be resolved via NewtypeOptimizationForGADTS, which would be a good thing for other reasons too.
Ideally, we would like to define the coercion Coercible (Force a) a
, to witness the fact that Force a
is representationally the same as a
. However, there are two problems:

This coercion is illkinded (
Force a
has kindUnlifted
buta
has kind*
), so we would need John Major equality style coercions. 
The coercion is only valid in one direction: I can coerce from
Force a
toa
, but not viceversa: in the other direction, evaluation may be necessary.
My suggested implementation strategy is to bake in Force
as a special data type, which is represented explicitly in Core, but then optimized away in STG.
Optional extensions
(OPTIONAL) Give some syntax for Force
. Instead of writing f :: Force Int > Force Int
, we might like to write f :: Int! > Int!
. We define postfix application of bang to be a wrapping of Force
.
(OPTIONAL) Introduce a pattern synonym Thunk
.suspend
can be generalized into the bidirectional pattern synonym Thunk
:
patternThunk a < x let a =Force x
whereThunk(Force a)= a
Iceland_jack: This syntax is invalid, is this what was intended?
pattern::Force a > a patternThunk a <(Force> a)whereThunk(Force a)= a
For example:
let x =Thunk(error"foo"::ForceInt)::IntinTrue
does not error. Pattern matching over Thunk
forces the argument (similar to bang patterns) and returns the unlifted value (unlike bang patterns):
letThunk x =3+5::Intin x ::ForceInt
Dynamic semantics of unlifted types
In this section, we review the dynamic semantics of unlifted types. These are not being added by our proposal (since they are already implemented by #
), but since they are fairly unfamiliar to most Haskell users, I think this section will be useful.
Case binding. Given case e of x1 > e1
, where e
is Unlifted
, e
is evaluated to whnf, and then the result is casematched upon. (i.e. it is always as if it is a strict pattern match.)
Let bindings. Given let x = e in e'
, where x
is Unlifted)
, this desugars to let !x = e in e'
which desugars to case e of !x > e'
. Mutually recursive let bindings of unlifted variables are not allowed. Let bindings are evaluated bottom up (but see #10824).
Conditionals. Given if e then e1 else e2
where e1
and e2
are Unlifted
, this desugars into the obvious case. (NB: this means e1
and e2
are not eagerly evaluated, as they would be for an ifthenelse
function.)
Constructors and function application. Given K e
where e
is Unlifted
, this desugars into case e of !x > K x
. NB: if K
is in kind star, then this expression admits bottom!
Intuitively, if you have an unlifted type, anywhere you let bind it or pass it to a function, you evaluate it. Strict let bindings cannot be arbitrarily floated; you must preserve the ordering of bindings and they cannot be floated beyond an expression kinded *
.
FAQ
Why do you get to use error in the unlifted examples, isn't error a bottom?error
is specially treated to be both lifted and unlifted. It's interpretation in an unlifted setting is that it immediately causes an exception when it is evaluated (it never goes into the heap).
Why not !Int
rather than Int!
as the syntax proposal? This syntax conflicts with strict data fields. data S a = S !a
has a constructor of type S :: Int > S
, taking a lifted type and evaluating it before placing it in the constructor; data S a = S a!
has a constructor of type S :: Force Int > S
, which requires the user to have forced the integer. Representationally, the data types are the same, but the outward behavior for clients differs dramatically.
Is Force (Maybe (Force Int))
allowed? No, because Force Int
has kind Unlifted
but Maybe
has kind * > Unlifted
. A data type declaration must be explicitly written to accept an unlifted type (data StrictMaybe (a :: Unlifted) = SMaybe a
), or simply be strict in its field (data StrictMaybe2 a = SMaybe2 !a
).
How does this affect inlining? In any CBV language, inlining doesn't always preserve semantics; unlifted types are no different. For example, this errors:
let x =error"foo"::ForceInt
y = suspend x ::IntinTrue
but this example does not:
let y = suspend (error"foo"::ForceInt)::IntinTrue
What's the difference between Force Int
and Int#
?Force Int
is an unlifted, boxed integer; Int#
is an unlifted, unboxed integer.
Why aren't strict patterns enough? A user can forget to write a strict pattern at a usesite. Putting a type in kind unlifted forces all usesites to act as if they had strict patterns.
Is Force [a]
the type of strict lists? No. It is the type of a lazy list whose head is always evaluated and nonbottom.
Does foo :: Force [a] > Force [a]
force just the first constructor or the whole spine? You can't tell; the head not withstanding, [a]
is still a lazy list, so you would need to look at the function body to see if any extra forcing goes on. Force
does not induce seq
ing: it is an obligation for the callsite.
Fully alternate proposal
This section describes a totally alternate proposal to the one(s) above. The motivation is the same, but the proposal otherwise starts from scratch.
Proposal B1. Allow newtypes over unlifted types
Identical to Proposal 3 above.
Proposal B2. More levity polymorphism
NoSubKinds proposes the following arrangement:
dataLevity=LiftedUnliftedTYPE::Levity>*type*=TYPE'Liftedtype#=TYPE'Unlifted
Instead, Proposal B2 suggests the following generalization:
dataBoxity=BoxedUnboxeddataLevity=LiftedUnliftedTYPE::Boxity>Levity>*type*=TYPE'Boxed'Lifted TYPE 'Boxed 'Unlifted is the kind of boxed, unlifted types TYPE 'Unboxed 'Unlifted is the kind of unboxed, unlifted types TYPE 'Unboxed 'Lifted is utterly wrong, and prohibited from ever appearing in a desugared programtypeB_L=TYPE'Boxed'Lifted just for this wiki pagetypeB_UL=TYPE'Boxed'Unlifted just for this wiki pagetypeUB_UL=TYPE'Unboxed'Unlifted just for this wiki page
Type variables would be allowed to have kinds B_L
or B_UL
but never UB_UL
. This would allow polymorphism over boxedbutunlifted things, by separating out the kinds. Functions could still not be properly levitypolymorphic  GHC would default a Boxity
metavariable to 'Boxed
and a levity metavariable to 'Lifted
during zonking. (Unless the Levity
metavariable is paired with a boxity metavariable that is already known to be 'Unboxed
.)
This treatment is a simple extension of levity polymorphism, and it serves the same goals: to have a few cases where we really do want levitypolymorphic code (error
, undefined
) while providing a nicely uniform structure for kindsoftypeswithvalues. I (Richard) am a little bothered by the existence of an unused spot in the structure (TYPE 'Unboxed 'Lifted
) but not enough to back down.
Definition: A valued kind is a kind of types that have values. In released GHC, the valued kinds are *
, #
, and Constraint
. Proposal B2 suggests adding TYPE 'Boxed 'Unlifted
and TYPE 'Unboxed 'Unlifted
to this set.
(Extension: Edward Kmett would recommend
dataConstraintiness=ConstraintNotConstraintTYPE::Boxity>Levity>Constraintiness>*
to allow better inference around tuple types. While RAE thinks this would actually work just fine, it feels like one bridge too far.)
Proposal B3. Levity polymorphic data types
Proposal 1, above, allows users to make unlifted data types, separately from lifted ones. RAE is worried that this approach would lead to tons of duplication of data types (we might need four variants of Maybe
, for example) and wants a more uniform way forward. Hence this Proposal B3.
The proposal will explain by way of example. (The use of GADT syntax here is incidental, but it makes the text in this proposal work out better. The normal traditionalsyntax declaration would have identical behavior.)
dataMaybe a whereJust:: a >Maybe a
Nothing::Maybe a
All we know about the type variable a
is that it is used as an argument to >
. Functions can naturally have any valued kind as an argument. But it would be silly to allow a :: TYPE 'Unboxed 'Unlifted
, as we'll have trouble generating code for that. Furthermore, we wish to treat all datatypes as levity polymorphic in their return kind (the heart of Proposal B3). So, we infer this kind for Maybe
:
Maybe:: forall (v1 ::Levity)(v2 ::Levity).TYPE'Boxed v1 >TYPE'Boxed v2
Of course, this is all a bit of a hack. There really must be 4 datatypes under the hood. But each of the 4 make sense to have, and would, no doubt, be wanted ere long by programmers. When zonking termlevel program text, we would need to squeeze out any remaining levity polymorphism so that the backend could figure out which of the four Maybe
types we really want here. (The four types really make a data family
, don't they? Using data families in the implementation might simplify things a bit.)
To reiterate: the levity polymorphism here is just to simplify the life of the programmer by using one name  Maybe
 for all four sensible variants of an option type. Just like using levity polymorphism for undefined
make the life easier for the programmer.
A recursive example
Let's look at a recursive datatype to see how this works out:
dataList a whereNil::List a
Cons:: a >List a >List a
After inference and with everything explicit, we would get
dataList:: forall (v1 ::Levity)(v2 ::Levity).TYPE'Boxed v1 >TYPE'Boxed v2 whereNil:: forall (v1 ::Levity)(v2 ::Levity)(a ::TYPE'Boxed v1).List v1 v2 a
Cons:: forall (v1 ::Levity)(v2 ::Levity)(a ::TYPE'Boxed v1).
a >List v1 v2 a >List v1 v2 a
According to this definition, List a :: B_UL
is really a proper strict list. That is, we avoid some of the problems in the original proposal (not the "B" proposals) by avoiding issues around affecting only the head of lists. This definition affects the whole list.
Of course, users are free to use levity polymorphic recursion to achieve other combinations of strictness, but the default is to have either all lazy or all strict.
Note: This Proposal B3 does overlap some concerns: it combines the idea of unlifted datatypes with a means for using one name for multiple entities. But I think this makes sense given the convenience of this approach.
Design questions

Should levity polymorphism apply to all datatypes? If we think "yes", do we turn it off when a user explicitly writes
> *
in a kind signature for a datatype? 
It's tempting to say that the
!
prefix operator is now just an identity operation with kindB_UL > B_UL
. That is, it just forces the existing kind inference machinery to infer an unlifted kind. But this would seem to break existing code that expects!
not to affect kinds. 
Could the
Force
andsuspend
ideas from the nonB proposals help here? Perhaps. RAE doesn't fully understandForce
.
Proposal B4: Levity polymorphic functions
Carrying forward from Proposal B3, we could extend levity polymorphism to work on functions. For example, map
could be inferred to have this type
map:: forall (v1 ::Levity)(v2 ::Levity)(v3 ::Levity)(v4 ::Levity)(a ::TYPE'Boxed v1)(b ::TYPE'Boxed v2).(a > b)>[]@v1 @v3 a >[]@v2 @v4 b
Here, v1
and v2
are the levities of a
and b
, respectively, and v3
and v4
are the levities of the argument and result lists, respectively. As previously articulated, we can't have proper levity polymorphic functions, instead requiring specialization. This would mean sixteenmap
s. That's quite a high number. But I (Richard) think all of the variants make sense. I'd love for there to be some optimization to prevent the need to export 16 definitions here, but I don't see one. In any case, I do think this would work swimmingly for users. Is it worth the cost? That's up for debate.
Conjecture: If we take this idea to its logical extreme  of making all functions levity polymorphic in all of the datatypes mentioned  then RAE believes that using !
to control kinds in datatype definitions would not break any code.
Design questions
 Is there a way to reduce duplication in object code? If we can't find a "yes" answer, this problem may kill the idea.
Parametricity
Some of the examples given in the alternate proposals (B3 and B4) are not parametric polymorphism, but rather adhoc polymorphism of levities. For instance, the map
example does not have one uniform implementation, but many based on the instantiations, and each case has quite different behavior.
However, there are many examples of functions that can be properly levity polymorphic, because they are just (at a low level) moving pointers around, and the levity only tracks the overall calling convention for those pointers, rather than the operation of the body of the function. The most immediate example is constructors of data types. If we have:
dataT:: forall (l ::Levity).(a ::TYPE'Boxed l)>TYPE'Boxed'LiftedwhereC:: forall (l ::Levity)(a ::Type'Boxed l). a >T a
Then C
is genuinely parametric in l
. It accepts a pointer to a value that is either lifted or not, stores it, and matching on it later yields a pointer to a value with the same status. It is unclear to the present author whether C
would also be genuinely parametric if T
were parameterized on its levity status as well.
There are examples beyond constructors, though. All mutable cells for boxed types have the same behavior of simply storing a pointer, so put and get operations should be genuinely parametric. They simply preserve the calling conventions of things they are fed and return. Further, this area is quite rich, and enabling these parametric uses fixes problems with duplication, type safety and efficiency in primops. Right now one must choose between:
ArrayArray# not type safeArray#(Array a) has extra indirection
whereas with levity polymorphic Array#
, one could have Array# (Array# a)
which is the best of both worlds (eliminating the need for duplication).
As a final thought, let's consider the map
example:
map:: forall (v1 v2 v3 v4 ::Levity)(a ::TYPE'Boxed v1)(b ::TYPE'Boxed v2)(a > b)>[]@v1 @v3 a >[]@v2 @v4 b
map f []=[]map f (x:xs)=let y = f x
ys = map f xs
in y:ys
The author believes that map
is parametric in v1
, as the list stored a
values with that calling convention, and f
expects them with the same. However, v2
determines whether a closure needs to be built for y
, and unless this is somehow attributed to f
, different code must be generated for map
. The same seems to go for v4
, except the fact that it's a recursive call suggests that it doesn't matter who we attribute the closurebuilding responsibility to, and map
is not parametric in it. The remaining question is v3
, but it would seem that map
is not parametric in it, either, since it determines whether case analysis is required to ensure evaluation. So map
is only parametric polymorphic in one of the four variables given.
It is this author's opinion that it would be unfortunate to give adhoc levity polymorphism the same syntax as parametric polymorphism (and also that making everything adhoc overloaded in levity would probably be confusing), given that there are substantial examples of the latter.
Minimal Semantics
With regard to the quote in the Motivation section, it should be noted that it
is mostly untrue. The data type data Nat = Z  S !Nat
is a flat domain of
natural numbers; but this is also true in strict languages like ML. The only
difference between it and data unlifted UNat = UZ  US UNat
is the (default)
'calling convention' for operations on the type. For Nat
the calling
convention is nonstrict by default (although one can force strictness), and for
UNat
(and the type in ML) it is (necessarily) strict. The strict calling
convention allows one to ignore the domain aspect for arguments to a function
(as opposed to expressions of a type, which coudl denote bottom).
Thus, it seems as though the semantic gap could be filled if !Nat
and the like
were simply proper types. We could have !Nat :: Unlifted
. It could be
represented identically to Nat
, but bindings of that type would need to be
evaluated eagerly (and probably wouldn't be possible at the top level), and
'values' of the type could be relied upon to never be bottom. So for instance, a
function of type !Nat > ...
would be able to rely upon its first argument
being a nonbottom, evaluated natural number. Semantically, the values of !Nat
are exactly the set of natural numbers, and expressions denote the flat domain,
but the eager callling conventions ensure that we don't need to think about
bottom as a value.
In fact, this minimal addition bears a strong resemblance to the original
version of this proposal. However, it was noted that there is a problem with
the current meanting of !a
in constructors. Given the declaration:
dataT=C!Nat!Nat
The constructor C
has type Nat > Nat > T
which forces its arguments,
rather than !Nat > !Nat > T
. And changing it to the latter would be a
significant, backwardincompatible change.
However, there may be a solution to this problem in the upcoming work on
impredicativity. That work proposes to add a new constraint t <~ u
, meaning
that t
canbeinstantiated/isasubtypeof u
. The corelevel witness of
t <~ u
is simply a function t > u
.
It is conceivable that with this work, we could add the axiom !a <~ a
. The
witness of this is simply an identity function that has an effect of forgetting
that its argument must already be evaluated, because the values of type !a
are
also valid values of type a
. Then, if the match:
case t ofC x y >...
resulted in x, y :: !Nat
, the judgments x :: Nat
and y :: Nat
(as the
Haskell report would specify) are valid. And also, if we have x, y :: !Nat
,
the application C x y
is welltyped with C :: Nat > Nat > T
as a
constructor that evaluates its arguments. The only remaining piece of the puzzle
would be to have an optimization pass that eliminates the redundant evaluation
that C
would do if called with arguments of type !Nat
(via weakening to
Nat
).
Assuming this all works as outlined above, it appears to introduce no redundancy to the language, contrary to having an entirely separate data declaration form.