This chapter was thoroughly changed Feb 2003.
data T a = MkT !(a,a) !(T a) | Nil
f x = case x of
MkT p q -> MkT p (q+1)
Nil -> Nil
The user's source program mentions only the constructors MkT
and Nil. However, these constructors actually do something
in addition to building a data value. For a start, MkT evaluates
its arguments. Secondly, with the flag -funbox-strict-fields GHC
will flatten (or unbox) the strict fields. So we may imagine that there's the
source constructor MkT and the representation constructor
MkT, and things start to get pretty confusing.
GHC now generates three unique Names for each data constructor:
---- OccName ------
String Name space Used for
---------------------------------------------------------------------------
The "source data con" MkT DataName The DataCon itself
The "worker data con" MkT VarName Its worker Id
aka "representation data con"
The "wrapper data con" $WMkT VarName Its wrapper Id (optional)
Recall that each occurrence name (OccName) is a pair of a string and a
name space (see The truth about names), and
two OccNames are considered the same only if both components match.
That is what distinguishes the name of the name of the DataCon from
the name of its worker Id. To keep things unambiguous, in what
follows we'll write "MkT{d}" for the source data con, and "MkT{v}" for
the worker Id. (Indeed, when you dump stuff with "-ddumpXXX", if you
also add "-dppr-debug" you'll get stuff like "Foo {- d rMv -}". The
"d" part is the name space; the "rMv" is the unique key.)
Each of these three names gets a distinct unique key in GHC's name cache.
data T a = MkT !(a,a) !Int | Nil
f x = case x of
Nil -> Nil
MkT p q -> MkT p (q+1)
When the parser reads it in, it decides which name space each lexeme comes
from, thus:
data T a = MkT{d} !(a,a) !Int | Nil{d}
f x = case x of
Nil{d} -> Nil{d}
MkT{d} p q -> MkT{d} p (q+1)
Notice that in the Haskell source all data contructors are named via the "source data con" MkT{d},
whether in pattern matching or in expressions.
In the translated source produced by the type checker (-ddump-tc), the program looks like this:
f x = case x of
Nil{d} -> Nil{v}
MkT{d} p q -> $WMkT p (q+1)
Notice that the type checker replaces the occurrence of MkT by the wrapper, but
the occurrence of Nil by the worker. Reason: Nil doesn't have a wrapper because there is
nothing to do in the wrapper (this is the vastly common case).
Though they are not printed out by "-ddump-tc", behind the scenes, there are also the following: the data type declaration and the wrapper function for MkT.
data T a = MkT{d} a a Int# | Nil{d}
$WMkT :: (a,a) -> T a -> T a
$WMkT p t = case p of
(a,b) -> seq t (MkT{v} a b t)
Here, the wrapper $WMkT evaluates and takes apart the argument p,
evaluates the argument t, and builds a three-field data value
with the worker constructor MkT{v}. (There are more notes below
about the unboxing of strict fields.) The worker $WMkT is called an implicit binding,
because it's introduced implicitly by the data type declaration (record selectors
are also implicit bindings, for example). Implicit bindings are injected into the code
just before emitting code or External Core.
After desugaring into Core (-ddump-ds), the definition of f looks like this:
f x = case x of
Nil{d} -> Nil{v}
MkT{d} a b r -> let { p = (a,b); q = I# r } in
$WMkT p (q+1)
Notice the way that pattern matching has been desugared to take account of the fact
that the "real" data constructor MkT has three fields.
By the time the simplifier has had a go at it, f will be transformed to:
f x = case x of
Nil{d} -> Nil{v}
MkT{d} a b r -> MkT{v} a b (r +# 1#)
Which is highly cool.
map MkT xs
then $WMkT will not be inlined (because it is not applied to anything).
That is why we generate real top-level bindings for the wrapper functions,
and generate code for them.
MkT{v} = \ p q r -> MkT{v} p q r
This is a real hack. The occurrence on the RHS is saturated, so the code generator (both the
one that generates abstract C and the byte-code generator) treats it as a special case and
allocates a MkT; it does not make a recursive call! So now there's a top-level curried
version of the worker which is available to anyone who wants it.
This strange definition is not emitted into External Core. Indeed, you might argue that we should instead pass the list of TyCons to the code generator and have it generate magic bindings directly. As it stands, it's a real hack: see the code in CorePrep.mkImplicitBinds.
data T a = MkT a a Int# | Nil{d}
$WMkT :: (a,a) -> T a -> T a
$WMkT p t = case p of
(a,b) -> seq t (MkT a b t)
f x = case x of
Nil -> Nil
MkT a b r -> MkT a b (r +# 1#)
Notice that it makes perfect sense as a program all by itself. Constructors
look like constructors (albeit not identical to the original Haskell ones).
When reading in External Core, the parser is careful to read it back in just as it was before it was spat out, namely:
data T a = MkT{d} a a Int# | Nil{d}
$WMkT :: (a,a) -> T a -> T a
$WMkT p t = case p of
(a,b) -> seq t (MkT{v} a b t)
f x = case x of
Nil{d} -> Nil{v}
MkT{d} a b r -> MkT{v} a b (r +# 1#)
case e of
MkT p t -> ..p..t..
GHC will desugar this to the following Core code:
case e of
MkT a b t -> let p = (a,b) in ..p..t..
The local let-binding reboxes the pair because it may be mentioned in
the case alternative. This may well be a bad idea, which is why
-funbox-strict-fields is an experimental feature.
It's essential that when importing a type T defined in some external module M, GHC knows what representation was used for that type, and that in turn depends on whether module M was compiled with -funbox-strict-fields. So when writing an interface file, GHC therefore records with each data type whether its strict fields (if any) should be unboxed.
Every data constructor Chas two info tables: