Relax Note [Phantom type variables in kinds] to allow for phantom levity vars

added core lint needs triage typechecker labels

mentioned in merge request !2218 (closed)

We spent an hour discussing this in the FC call. Richard will add some more detail, but the TL;DR is

We should check each data constructor separately to check that it has a well-kinded type; if not, reject.

So that would reject your nullary MkT. Sad, but we don't know how to do better yet. And I really don't want this to stand in the way of your lovely patch. Leave it for later!

I think we don't even need to change anything. It turns out that the panic I was experiencing is reproducible on HEAD and due to -ddump-tc-trace: #19495 (closed).

If I don't supply -ddump-tc-trace to !2218 (closed), it works fine and ends compilation with

test10.hs:12:10: error:
    • Quantified type's kind mentions quantified type variable
        type: ‘forall (l :: Levity). T @l’
      where the body of the forall has this kind: ‘TYPE ('BoxedRep l)’
    • In the definition of data constructor ‘MkT’
      In the data type declaration for ‘T’
   |
12 | data T = MkT
   |          ^^^

(Which is the same error as for T16391b, so it is covered by tests.)

I'll leave the ticket open until we have a better idea whether and how to allow such nullary lev-poly constructors.

removed needs triage label

mentioned in issue #19767

In #19767 it would be really nice if we had a levity-polymorphic Bool. We can't have that, as the types of both True and False would not be well-kinded according to this ticket.

added UnliftedDatatypes representation polymorphism labels

I really don't know how to do this.

The key question: when we wish to evaluate (\ y -> ...) x, what do we do? Do we call the function? Or do we evaluate x? Since we've had levity-polymorphism, the answer has been:

Get x's type; let's call it ty
Get ty's kind; let's call it ki
ki must have the form TYPE rep for some rep
Look at rep: if it's an unlifted representation, then evaluate x; otherwise, call the function

This approach is very handy, because it allows us to abstract over unlifted types, which was previously impossible. (Before levity-polymorphism, the second step in the sequence above was essentially to look at ty and check whether it was one of a known list of unlifted types, considering anything else to be lifted. Such an approach clearly doesn't allow abstraction over unlifted types.)

Key consequence of the above approach: the levity of a type must be apparent in its kind. Note: not just the representation, but the levity. (The representation must also be apparent, for code generation. But let's not get distracted.)

Getting back to Bool, what type could a levity-polymorphic True have? Well, it must be the case that True @Lifted :: ty1 :: TYPE (BoxedRep Lifted) and True @Unlifted :: ty2 :: TYPE (BoxedRep Unlifted) for some types ty1 and ty2. This maybe suggests a way forward, using a data family:

data family Bool (l :: Levity) :: TYPE (BoxedRep l)
data instance Bool Lifted = TrueLifted | FalseLifted
data instance Bool Unlifted = TrueUnlifted | FalseUnlifted

Here, we have the following facts:

TrueLifted :: Bool Lifted :: TYPE (BoxedRep Lifted)
FalseLifted :: Bool Lifted :: TYPE (BoxedRep Lifted)
TrueUnlifted :: Bool Unlifted :: TYPE (BoxedRep Unlifted)
FalseUnlifted :: Bool Unlifted :: TYPE (BoxedRep Unlifted)

So far, so good. But can we make it more convenient? I think we can:

class Booly (b :: TYPE (BoxedRep l)) where
  mkTrue :: b
  mkFalse :: b
  checkTrue :: b -> Int#
  checkFalse :: b -> Int#
  checkFalse = (1# -#) . checkTrue

instance Booly (Bool Lifted) where
  mkTrue = TrueLifted
  mkFalse = FalseLifted
  checkTrue TrueLifted = 1#
  checkTrue FalseLifted = 0#

instance Booly (Bool Unlifted) where
  mkTrue = TrueUnlifted
  mkFalse = FalseUnlifted
  checkTrue TrueUnlifted = 1#
  checkTrue FalseUnlifted = 0#

pattern True :: Booly b => b
pattern True <- (checkTrue -> 1#)
  where True = mkTrue
pattern False :: Booly b => b
pattern False <- (checkFalse -> 1#)
  where False = mkFalse

That's quite painful to define, but I think it would work well in practice.

Can we do better? In order to tell, we need to go back and continue understanding more about the status quo. Returning to our ty1 and ty2, the "natural" choice would be to choose ty1 := Bool Lifted and ty2 := Bool Unlifted, where Bool :: forall (l :: Levity) -> TYPE (BoxedRep l). (This is the same kind for Bool as in my data family approach.) But now, if Bool is an ordinary datatype, we have True :: forall (l :: Levity). Bool l. The key problem here is that the kind of a type forall ... . blah is always the same as the kind of blah, because type abstractions are erased before code generation. What's tantalizing here is that I don't think code generation needs levity. It absolutely needs representation information, but perhaps not levity information.

Maybe here's a way forward, then:

Define a meta-function rep(k) that takes a kind and returns the representation of values of types of that kind:

rep(TYPE IntRep) = PrimRep.IntRep
rep(TYPE WordRep) = PrimRep.WordRep
rep(TYPE (BoxedRep Lifted)) = PrimRep.BoxedRep
rep(TYPE (BoxedRep Unlifted)) = PrimRep.BoxedRep
...

The right-hand sides are constructors of the type GHC.Core.TyCon.PrimRep, whereas the similarly-named things on the left are constructors of type RuntimeRep. RuntimeRep is the parameter to TYPE, while PrimRep describes the representation of a type within GHC. (https://gitlab.haskell.org/ghc/ghc/-/blob/d2399a46a01a6e46c831c19e797e656a0b8ca16d/compiler/GHC/Types/RepType.hs#L319 might be helpful to read if you're thrown off at this point.) I have written PrimRep.BoxedRep on the right-hand side, but no such constructor for PrimRep exists today. Instead, we have PrimRep.LiftedRep and PrimRep.UnliftedRep. However, we have the following key observation:

Key observation: GHC never usefully discriminates between PrimRep.LiftedRep and PrimRep.UnliftedRep.

By "usefully", I mean that GHC does serialize the two differently, but that's not interesting. It also (in GHC.Builtin.Types.Prim) uses PrimRep as a convenient abbreviation for a RuntimeRep, but it would be easy to pass in either the more expressive RuntimeRep directly (or perhaps both the RuntimeRep and the PrimRep). The only other place where the discrimination happens is in the unariser, where we need to invent a fresh variable with the right representation -- but note that this requires the right representation, not the right levity. So, once again, the discrimination is not useful.

Having laid out rep, we can begin to imagine how to change our type structure, with this key requirement:

Key requirement for type erasure: We need rep(kind(forall a. blah)) to equal rep(kind(blah)).

Now we are looking good. Because we do not require kind(forall a. blah) to equal kind(blah), we can modify the typing rule for forall to uphold Key requirement without leaving the kind completely unchanged. Here is a concrete proposal:

type family UnknownLevity :: Levity where {}   -- empty

G, a:k |- ty : TYPE r
r /= BoxedRep
a \notin r
------------------------------- OldRule
G |- forall (a:k). ty : TYPE r

G, a:k |- ty : TYPE (BoxedRep l)
----------------------------------------------- NewRule
G |- forall (a:k). ty : TYPE (BoxedRep UnknownLevity)

And now, we even have the way forward for the unariser: it will convert a PrimRep.BoxedRep to a TYPE (BoxedRep UnknownLevity) which (I think) is sufficient.

This all adds yet another twist to a narrow passage. But just perhaps it's a good idea.

I would love @simonpj's thoughts on the above.

Those rules are no good, in fact. They don't support substitution, for example. Let's try again:

G, a:k |- ty : TYPE r
a \notin r
------------------------------- OldRule
G |- forall (a:k). ty : TYPE r

G, a:k |- ty : TYPE (BoxedRep l)
a \in l (after expanding type synonyms)
----------------------------------------------- NewRule
G |- forall (a:k). ty : TYPE (BoxedRep UnknownLevity)

To be clear, both the old and new rule would be in effect.

If a is in r, but r is not BoxedRep, then no rule is matches, and the type is ill-kinded, as today. Example: forall (a::RuntimeRep). (ty :: TYPE a)

That does sound plausible -- definitely worth trying

@simonpj and I agree that the idea in my long #19487 (comment 349133) seems likely to succeed. Would you (for any value of "you") want to take a stab at implementing? I'm very happy to coach you through.

Here is how I would proceed:

Add UnknownLevity to GHC.Types (in the ghc-prim library) and as a wired-in tycon in GHC.Builtin.Types.
Change GHC.Core.TyCon.PrimRep to have just BoxedRep instead of LiftedRep and UnliftedRep. This will require changes to pcPrimTyCon, as a PrimRep no longer has enough information to create the correct kinds of TyCons.
Add new code to GHC.Core.Lint that uses the new rule (the old rule is completely unchanged) to accept more programs. GHC.Core.Lint.lintForAllBody looks like the place to start.
Change GHC.Core.Type.typeKind to mirror the change to the linter.
Change the type validity check (GHC.Tc.Validity.checkEscapingKind looks like a nice starting point) to accept the new form of type.
Change the type kind-checker GHC.Gen.HsType.tc_hs_type to figure out the right kind for ForAllTys. Currently, the HsForAllTy case just passes exp_kind to the recursive call for the body, but the new situation will require some other logic.

I think you could deliver all of these as separate patches. But perhaps it's better to do them in one go -- up to you.

I think I agree completely with your plan! Seems like a nice solution. Although I think we should preserve Levity in BoxedRep when going to STG land, so that codegen can omit tag checks for Unlifted things.

What to do about UnknownLevity in PrimRep? I think we should lower negative occurrences (parameters) to Lifted and positive occurrences (return values) to Unlifted, inspired by variance in subtyping: Given the subtyping relation Unlifted <: Lifted, Lifted -> Lifted -> Unlifted is a subtype of both Unlifted -> Unlifted -> Unlifted and Lifted -> Lifted -> Lifted. Alternatively, stick to levity polymorphism in STG and maybe change NewRule to

G, a:k |- ty : TYPE (BoxedRep l)
a \in l (after expanding type synonyms)
----------------------------------------------- NewRule
G |- forall (a:k). ty : TYPE (BoxedRep l)                  <--------- l instead of UnknownLevity

? I don't see why we need UnknownLevity. Ah, probably because our PrimRep-based type system builds on our regular one with arrows, forall and so on and we can't embed type variables in a PrimRep. Fair enough, then our best bet is the subtyping system, I suppose (hoping, but also doubting, that it is expressive enough).

Although I think we should preserve Levity in BoxedRep when going to STG land, so that codegen can omit tag checks for Unlifted things.

To do this, I think we'll need to retain levity information in PrimRep. So this suggests something like

data PrimLevity = UnliftedPL | UnknownPL
data PrimRep
  = BoxedRep PrimLevity
  | ...

When converting a BoxedRep RuntimeRep into a PrimRep, if it's a BoxedRep Unlifted, we can convert to BoxedRep UnliftedPL. Otherwise (in the Lifted and variable cases), we go to BoxedRep UnknownPL. I think that works, given that Unlifted is just an optimization -- that is, wanting Unlifted but getting Unknown doesn't cause a crash (right?).

We need UnknownLevity because a typing rule like the one you wrote has a major problem: the kind below the line mentions an out-of-scope l. I have no idea what such a rule means. (For example, it is utterly inexpressible with de Bruijn indices.) We need to put something there, and UnknownLevity is that something.

Why can't we do this with RuntimeReps in general? Because there's no conservative choice for how to represent something of unknown representation. Happily, and crucially for our cause here, there is a conservative choice of how to deal with unknown levity: just pretend the thing is lifted.

data PrimLevity = UnliftedPL | UnknownPL Otherwise (in the Lifted and variable cases), we go to BoxedRep UnknownPL.

That should work, I think. But then you could also call UnknownPL LiftedPL instead... Although UnknownPL might indeed be less susceptible to misinterpretation.

We need UnknownLevity because a typing rule like the one you wrote has a major problem: the kind below the line mentions an out-of-scope l. I have no idea what such a rule means.

I was thinking that the l is a meta var denoting an arbitrary expression rather than a variable. An expression in which a occurs free (after type synonym expansion). So we don't need to bind l, we only need to bind every variable free in l. And that we do.

I think the actual problem is in the translation of that kind to PrimRep. You invented UnknownLevity in order not having to track forall-bound levity vars in the translation process (your rep function above). One could just as well maintain an environment TyVar -> PrimLevity for forall-bound levity vars in rep. (Since that environment will be const UnknownPL anyway, we may just as well maintain a VarSet.)

The advantage: No need for a strange UnknownLevity in Core. (Slight) Disadvantage, we need to adjust the

Key requirement for type erasure: We need rep(kind(forall a. blah)) to equal rep(kind(blah)).

to

Key requirement for type erasure: We need rep(env, kind(forall a. blah)) to equal rep(env, kind(blah)), unless a is a levity variable, then we need rep(env, kind(forall a. blah)) to equal rep(env[a->UnknownPL], kind(blah)).

and have an equation rep_lev(env, a) = env(a).

I retract my previous opinion on a subtyping system for STG (centered around Unlifted <: Lifted). It's not expressive enough to type useful higher-order programs such as levity-polymorphic app:

app :: forall l (a :: TYPE (BoxedRep l)). (a -> Int) -> a -> Int
app f x = f x

f_unl :: Array# Int -> Int
f_unl = ...

f_lif :: [Int] -> Int
f_lif = ...

main = print (app f_unl (...) + app f_lif [1,2,3])

This should ultimately be a valid GHC Haskell program. At least I think we can express it in Core.

What (fixed) levity should a get when compiling to STG? Whether you pick Unlifted or Lifted, you won't be able to type-check both app calls. This program is fundamentally untypeable in STG_<: (e.g., STG with subtyping as above) unless we integrate levity type variables in the long-term.

I was thinking that the l is a meta var denoting an arbitrary expression rather than a variable. An expression in which a occurs free (after type synonym expansion). So we don't need to bind l, we only need to bind every variable free in l. And that we do.

My mistake: yes, l is an arbitrary expression. But if a is free in l, then a is not bound in the result: it's not in the G. Here is the relevant lemma:

Lemma (Regularity): If G |- ty : ki, then G |- ki : TYPE r and G |- r : RuntimeRep.

That lemma is not true when considering the rule you propose, but we definitely need it for type safety. Here is actually a simpler version of the key property:

Lemma (Regular Scoping): If G |- ty : ki, then fv(ty) \subset dom(G) and fv(ki) \subset dom(G).

The first lemma implies the second, but perhaps the second is easier to reason about (and often has to be proved before we can prove the first one, anyway).

Sorry for being slow, but can you maybe point out where my typing rule is broken? You said "But if a is free in l, then a is not bound in the result", but I'm not sure which result you mean. a can occur in both mentions of l and both mentions either bind it in G (in the LHS of |-) or in the forall in the RHS of |-.

In fact, my NewRule is like your NewRule, except that I say l where you said UnknownLevity. If the latter is OK, then why isn't the former? It binds a in the forall.

I don't want to put well-scopedness up for debate at all, I just don't see how the new rule violates it.

Here is the rule you propose:

G, a:k |- ty : TYPE (BoxedRep l)
a \in l (after expanding type synonyms)
----------------------------------------------- SebastianRule
G |- forall (a:k). ty : TYPE (BoxedRep l)

We will evaluate this rule against my "Regular Scoping" lemma: If G |- ty : ki, then fv(ty) \subset dom(G) and fv(ki) \subset dom(G).

We assume G, a:k |- ty : TYPE (BoxedRep l) for some context G, variable a, and types ty, k, and l. Furthermore, we assume a \in l -- more formally, a \in fv(l).
By the well-formedness of G, a:k, we further assume a \notin dom(G). That is, the context G, a:k does not bind a twice.
The Regular Scoping lemma says that fv(TYPE (BoxedRep l)) \subset dom(G).
We know that a \in fv(TYPE (BoxedRep l)) because the definition of fv says that fv(TYPE (BoxedRep l)) = fv(l).
Thus, the Regular Scoping lemma says that a \in dom(G).
But this is a contradiction, because we assumed a \notin dom(G).

Maybe more simply: the problem is that, in the result, the forall scopes over the ty, but not the TYPE (BoxedRep l).

Does this help clarify?

Relax Note [Phantom type variables in kinds] to allow for phantom levity vars

Child items ...

Activity