Skip to content

Allow nested foralls and contexts in prefix GADT constructors

Currently, prefix GADT constructors are very particular about where foralls and contexts are permitted in their types. In particular, the very beginning of a prefix GADT type can have an optional forall, followed by an optional context, followed by the rest of the type. Any other foralls or contexts that appear between argument types or the result type are rejected. For example, GHC does not permit this:

data T a where
  MkT :: Int -> forall a. T a

This is rather surprising, even to users who aren't me. Fundamentally, there is no reason why this has to be the case. In fact, having nested foralls is an implicit requirement an already accepted GHC proposal, A syntax for visible dependent quantification. Although !378 (merged) implemented most of this proposal, there is one part that it deliberately did not implement, namely:

A data constructor can use forall ... -> in its type (as given in GADT-syntax) or arguments, but any use of such a constructor in terms (as opposed to in a type) will be an error.

Without having the ability to use nested foralls in prefix GADT constructors, however, this feature is of limited utility. This is because you would be able to write this:

data VDQ a where
  MkVDQ :: forall a -> Int -> VDQ a

However, you would not be able to make a simple rearrangement of the arguments like this:

data VDQ' a where
  MkVDQ' :: Int -> forall a -> VDQ a -- Nested forall! Rejected.

So why doesn't today's GHC permit nested foralls/contexts in GADT constructors? Mainly because of engineering concerns. Various data structurs in the internals of GHC assume that all forall'd type variables and constraint arguments all occur upfront. This assumption is baked into the design of ConDeclGADT and DataCon, among other things.

How should we go about relaxing this restriction? While I haven't worked out the details completely yet, at minimum the following will need to happen:

  1. The representation of arguments in prefix ConDeclGADTs will need to change. Currently, the con_args field of ConDeclGADT uses HsConDeclDetails to represent the argument types of the constructor. This currently assumes that each argument is a BangType, however. In order to make this accommodate nested foralls and contexts, however, we will likely need a richer data structure. Something like:

    data FunArg pass
      = AnonArg    (HsScaled pass (LBangType pass))
      | ForallArg  (HsForAllTelescope pass)
      | ContextArg (LHsContext pass)

    If we do this, then we can likely get rid of con_forall and con_qvars, which currently store the leading foralls and context. Well, except that isn't quite right—GADT constructor types obey the forall-or-nothing rule, and as a result, leading foralls are special. Bah. I'll have to think carefully about how to do this right.

    Note that all of this only really applies to prefix GADT constructors. Record GADT constructors, on the other hand, can't really have nested foralls or contexts due to the syntax that they use. I suppose this means we might need a different AST form for record GADT constructors than we do for prefix GADT constructors. Hm.

  2. The way that user-written arguments are represented in DataCons will need to change. Currently, DataCon looks like this:

    data DataCon = MkData
      { ...
      , dcUnivTyVars       :: [TyVar]
      , dcExTyCoVars       :: [TyCoVar]
      , dcUserTyVarBinders :: [InvisTVBinder]
      , dcEqSpec           :: [EqSpec]
      , dcOtherTheta       :: ThetaType
      , dcOrigArgTys       :: [Type]
      , ...
      }

    For the most part, this corresponds exactly to how the data type worker looks. It must have the universally quantified type variables (dcUnivTyVars), followed by the existentially quantified type variables (dcExTyCoVars), followed by the context arguments (dcEqSpec + dcOtherTheta), finally followed by the visible arguments (dcOrigArgTys), in exactly that order.

    The exception to that rule is dcUserTyVarBinders, which remembers the order in which the user quantifies type variables in a GADT constructor. This information is not for the benefit of the worker, but for the wrapper. There is some amount of type variable swizzling performed in GHC.Types.Id.Make.mkDataConRep to ensure that the user-written order in dcUserTyVarBinders is rearranged to the universals-then-existentials order that the worker expects.

    If we permit nested foralls and contexts, the first step is to figure out how to change DataCon to represent it. I imagine that we will need to replace dcUserTyVarBinders with something more general:

      { ...
      , dcUserTyCoBinders :: [TyCoBinder]
      , ...
      }

    We will need TyCoBinders because the foralls and contexts can be intermixed with visible argument types, and moreover, these can happen in basically any order. What's interesting about this change is dcUserTyCoBinders will store all of the arguments to a DataCon, which will technically make dcUnivTyVars, dcExTyCoVars, dcEqSpec, dcOtherTheta, and dcOrigArgTys redundant. Still, we might consider keeping some (or all) of these fields anyway, since it could be more efficient to cache them rather than repeatedly computing them from the dcUserTyCoBinders.

  3. We will need to update Template Haskell's representation of prefix GADTs, which is insufficient to represent nested foralls and contexts:

    data Con
      = ...
      | ForallC [TyVarBndr] Cxt Con
      | GadtC [Name] [BangType] Type
      | ...

    It is unlikely that we will be able to make this work without redesigning Con to some extend. @rae suggests the following design:

    data Dec = ...
      | DataD ... Cons ...
      | NewtypeD ... Cons ...  -- Alternatively, ... (Either H98Con GadtCon) ... if we want to really
                               -- enforce the invariant that a newtype has exactly one data constructor
    
    data Cons
      = H98Cons [H98Con]
      | GadtCons [GadtCon]
    data H98Con = H98Con (Maybe [TyVarBndr]) (Maybe Cxt) Name H98ConBody
    data H98ConBody = NormalC ... | InfixC ... | RecC ...
    data GadtCon = GadtCon [Name] GadtConBody
    data GadtConBody = GadtC [FunArg ()] Type
                     | RecGadtC [FunArg Name] Type
    data FunArg name = FunArg name Multiplicity Bang Type
                     | ForallArg [TyVarBndr]
                     | CxtArg Cxt
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information