Add saturated constructor applications to Core
A long-standing performance bug in GHC is its behaviour on nested tuples.
- The poster-child case is #5642.
- The problem is that when you have a nested tuple application, e.g.
((4,True),('c','d')), the size of the type arguments grows quadratically with the expression size. This bites badly in some approaches to generic programming, which use nested tuples heavily.
- It's explained in detail in Section 2.3 of Scrap your type applications.
The same paper describes a solution, a modification of System F called System IF. It's neat, and I did once implement in in GHC. But it was quite complicated; most of the time the win was not big; and I don't know how it'd interact with casts, coercions, and type dependency in the latest version of GHC's Core.
So here's another idea, which came up in conversation at ICFP'16: add a staturated constructor application to Core. So Core looks like
data Expr v = Var v | App (Expr v) (Expr v) ... | ConApp DataCon [Expr v] -- NEW
Now I hate the idea of adding a new constructor to Core; I often brag about how few constructors it has. But the idea is this:
ConAppis always saturated; think of it as an uncurried application.
- Every data constructor has a wrapper Id. For simple constructors like
(:), the wrapper is just a curried version:
(:) = /\a. \(x:a). \(y:[a]). (:) [x,y]
where the "
(:) [x,y]" is just my concrete syntax for a
- We are used to having an intro and elim form for each type former. So for
(->)the intro form is
\x.eand the elim form is
(e1 e2). For a data type like
Maybe, the elim form is
case, but what's the intro form?
DataConexplicitly, rather than having it buried inside the
Id, which somehow seems more honest.
The proximate reason for doing this in the first place is to allow us to omit type arguments. Consider
Just True. Curently we represent this as
Var "Just" `App` Type boolTy `App` Var "True"
ConApp we can say
ConApp "Just" [ConApp "True" ]
because it's easy to figure out the
boolTy type argument from the argument. (We don't really have strings there, but you get the idea.)
Can we omit all the type arguments? No: we can omit only those that appear free in any of the argument types. That is usally all of them (including existentials) but not always. Consider:
data (,) a b where (,):: forall a b. a -> b -> (a,b) data  a where  :: forall a. [a] (:) :: forall a. a -> [a] -> [a] data Either a b where Left :: forall a b. a -> Either a b Right :: forall a b. b -> Either a b data (:=:) a b where Refl :: forall a b. (a~b) -> :=: a b data Foo where MkFoo :: a -> (a -> Int) -> Foo
For all of these data constructors except
Right we can omit all the type arguments, because we can recover them by simple matching against the types of the arguments. A very concrete way to think about this is how to implement
exprType :: Expr Id -> Type exprType (Var v) = varType v exprType (Lam b e) = mkFunTy (varType b) (exprType e) exprType (App f a) = funResultTy (exprType f) ... exprType (ConApp con args) = mkTyConApp (dataConTyCon con) ???
We know that the result type of type of a
ConApp will be
T t1 ..tn where
T is the parent type constructor of the
DataCon. But what about the (universal) type args
???? We can get them from the types of the arguments
map exprType args:
ConApp "(:)" [e1, e2], the type argument is just
ConApp "(:=:)" [e]", we expect
exprType eto return a type looking like
TyConApp "~" [t1, t2]. Then
t2are the types we want. So matching is required.
- What about an application of
Left?? We need to recover two type args, but
exprType e1gives us only one. So we must retain the other one in the application:
ConApp "Left" [Type ty2, e1]. Similarly for the empty list.
A simple once-and-for-all analysis on the
DataCon will establish how to do the matching, which type args to retain, etc.
- Pro: We can eliminate almost all type args of almost all data constructors; and for nested tuples we can eliminate all of them.
- Pro: it's elegant having the intro/elim duality.
- Pro: in GHC we often ask "is this expression a saturated constructor application?" (see
ConAppmakes it easier to answer that question.
- Pro: we do exactly this in types: we have
TyConApp. (In types a
TyConAppis not required to be saturated, but we could review that choice.)
- Con: adding a constructor is a big deal. In lots of places we'd end up saying
f (App e1 e2) = App (f e1) (f e2) f (ConApp c es) = ConApp c (map f es)
In the olden days GHC's
Apphad multiple arguments and the continual need to work over the list was a bit tiresome. Still
ConAppis very simple and uniform; quite often adding
mapwon't be difficult; and it may well be that constructors need to be treated differently anyway.
- Con: it's not a general solution to the type-argument problem. See #9198 for example. System IF is the only general solution I know, but it seems like too big a sledgehammer.
We'll only know if we try it. I estimate that it would take less than a week to work this change through all of GHC. 90% of it is routine.