While implementing the unlifted datatypes proposal (!2218 (closed)), I realised that an edge case violates the kinding rules (Note [Phantom type variables in kinds], in particular). Specifically
typeT::forall(l::Levity).TYPE(BoxedRepl)dataT=MkT
Note that
T has a nullary data con MkT
T is levity-polymorphic
That means the type of MkT, forall {l::Levity}. T @l, is ill-kinded! We have T @l :: TYPE (BoxedRep l) and the forall is supposed to bind the l. But the kind rule for forall a. TYPE rep demands that a is not free in rep:
ty : TYPE rep `a` is not free in rep(FORALL1) ----------------------- forall a. ty : TYPE rep
I argue that there is no harm in allowing TYPE (BoxedRep l) here, because l does not affect the representation of T. I'm unaware of any other implications this might have.
We have to be a bit careful with evaluation semantics, but we have better means (#15532 (comment 239845), #17521) to decide whether levity polymorphism is OK at the term level. In any instance, MkT is a data constructor and thus is a value under both call-by-need and call-by-value semantics. I'm reasonably positive that allowing MkT to be well-kinded is OK. You can't do anything with it at the term level unless you apply it to a Levity anyway.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
We spent an hour discussing this in the FC call. Richard will add some more detail, but the TL;DR is
We should check each data constructor separately to check that it has a well-kinded type; if not, reject.
So that would reject your nullary MkT. Sad, but we don't know how to do better yet. And I really don't want this to stand in the way of your lovely patch. Leave it for later!
I think we don't even need to change anything. It turns out that the panic I was experiencing is reproducible on HEAD and due to -ddump-tc-trace: #19495 (closed).
If I don't supply -ddump-tc-trace to !2218 (closed), it works fine and ends compilation with
test10.hs:12:10: error: • Quantified type's kind mentions quantified type variable type: ‘forall (l :: Levity). T @l’ where the body of the forall has this kind: ‘TYPE ('BoxedRep l)’ • In the definition of data constructor ‘MkT’ In the data type declaration for ‘T’ |12 | data T = MkT | ^^^
(Which is the same error as for T16391b, so it is covered by tests.)
I'll leave the ticket open until we have a better idea whether and how to allow such nullary lev-poly constructors.
In #19767 it would be really nice if we had a levity-polymorphic Bool. We can't have that, as the types of both True and False would not be well-kinded according to this ticket.
The key question: when we wish to evaluate (\ y -> ...) x, what do we do? Do we call the function? Or do we evaluate x? Since we've had levity-polymorphism, the answer has been:
Get x's type; let's call it ty
Get ty's kind; let's call it ki
ki must have the form TYPE rep for some rep
Look at rep: if it's an unlifted representation, then evaluate x; otherwise, call the function
This approach is very handy, because it allows us to abstract over unlifted types, which was previously impossible. (Before levity-polymorphism, the second step in the sequence above was essentially to look at ty and check whether it was one of a known list of unlifted types, considering anything else to be lifted. Such an approach clearly doesn't allow abstraction over unlifted types.)
Key consequence of the above approach: the levity of a type must be apparent in its kind. Note: not just the representation, but the levity. (The representation must also be apparent, for code generation. But let's not get distracted.)
Getting back to Bool, what type could a levity-polymorphic True have? Well, it must be the case that True @Lifted :: ty1 :: TYPE (BoxedRep Lifted) and True @Unlifted :: ty2 :: TYPE (BoxedRep Unlifted) for some types ty1 and ty2. This maybe suggests a way forward, using a data family:
That's quite painful to define, but I think it would work well in practice.
Can we do better? In order to tell, we need to go back and continue understanding more about the status quo. Returning to our ty1 and ty2, the "natural" choice would be to choose ty1 := Bool Lifted and ty2 := Bool Unlifted, where Bool :: forall (l :: Levity) -> TYPE (BoxedRep l). (This is the same kind for Bool as in my data family approach.) But now, if Bool is an ordinary datatype, we have True :: forall (l :: Levity). Bool l. The key problem here is that the kind of a type forall ... . blah is always the same as the kind of blah, because type abstractions are erased before code generation. What's tantalizing here is that I don't think code generation needs levity. It absolutely needs representation information, but perhaps not levity information.
Maybe here's a way forward, then:
Define a meta-function rep(k) that takes a kind and returns the representation of values of types of that kind:
The right-hand sides are constructors of the type GHC.Core.TyCon.PrimRep, whereas the similarly-named things on the left are constructors of type RuntimeRep. RuntimeRep is the parameter to TYPE, while PrimRep describes the representation of a type within GHC. (https://gitlab.haskell.org/ghc/ghc/-/blob/d2399a46a01a6e46c831c19e797e656a0b8ca16d/compiler/GHC/Types/RepType.hs#L319 might be helpful to read if you're thrown off at this point.) I have written PrimRep.BoxedRep on the right-hand side, but no such constructor for PrimRep exists today. Instead, we have PrimRep.LiftedRep and PrimRep.UnliftedRep. However, we have the following key observation:
Key observation: GHC never usefully discriminates between PrimRep.LiftedRep and PrimRep.UnliftedRep.
By "usefully", I mean that GHC does serialize the two differently, but that's not interesting. It also (in GHC.Builtin.Types.Prim) uses PrimRep as a convenient abbreviation for a RuntimeRep, but it would be easy to pass in either the more expressive RuntimeRep directly (or perhaps both the RuntimeRep and the PrimRep). The only other place where the discrimination happens is in the unariser, where we need to invent a fresh variable with the right representation -- but note that this requires the right representation, not the right levity. So, once again, the discrimination is not useful.
Having laid out rep, we can begin to imagine how to change our type structure, with this key requirement:
Key requirement for type erasure: We need rep(kind(forall a. blah)) to equal rep(kind(blah)).
Now we are looking good. Because we do not require kind(forall a. blah) to equal kind(blah), we can modify the typing rule for forall to uphold Key requirement without leaving the kind completely unchanged. Here is a concrete proposal:
type family UnknownLevity :: Levity where {} -- emptyG, a:k |- ty : TYPE rr /= BoxedRepa \notin r------------------------------- OldRuleG |- forall (a:k). ty : TYPE rG, a:k |- ty : TYPE (BoxedRep l)----------------------------------------------- NewRuleG |- forall (a:k). ty : TYPE (BoxedRep UnknownLevity)
And now, we even have the way forward for the unariser: it will convert a PrimRep.BoxedRep to a TYPE (BoxedRep UnknownLevity) which (I think) is sufficient.
This all adds yet another twist to a narrow passage. But just perhaps it's a good idea.
Those rules are no good, in fact. They don't support substitution, for example. Let's try again:
G, a:k |- ty : TYPE ra \notin r------------------------------- OldRuleG |- forall (a:k). ty : TYPE rG, a:k |- ty : TYPE (BoxedRep l)a \in l (after expanding type synonyms)----------------------------------------------- NewRuleG |- forall (a:k). ty : TYPE (BoxedRep UnknownLevity)
To be clear, both the old and new rule would be in effect.
@simonpj and I agree that the idea in my long #19487 (comment 349133) seems likely to succeed. Would you (for any value of "you") want to take a stab at implementing? I'm very happy to coach you through.
Here is how I would proceed:
Add UnknownLevity to GHC.Types (in the ghc-prim library) and as a wired-in tycon in GHC.Builtin.Types.
Change GHC.Core.TyCon.PrimRep to have just BoxedRep instead of LiftedRep and UnliftedRep. This will require changes to pcPrimTyCon, as a PrimRep no longer has enough information to create the correct kinds of TyCons.
Add new code to GHC.Core.Lint that uses the new rule (the old rule is completely unchanged) to accept more programs. GHC.Core.Lint.lintForAllBody looks like the place to start.
Change GHC.Core.Type.typeKind to mirror the change to the linter.
Change the type validity check (GHC.Tc.Validity.checkEscapingKind looks like a nice starting point) to accept the new form of type.
Change the type kind-checker GHC.Gen.HsType.tc_hs_type to figure out the right kind for ForAllTys. Currently, the HsForAllTy case just passes exp_kind to the recursive call for the body, but the new situation will require some other logic.
I think you could deliver all of these as separate patches. But perhaps it's better to do them in one go -- up to you.
I think I agree completely with your plan! Seems like a nice solution. Although I think we should preserve Levity in BoxedRep when going to STG land, so that codegen can omit tag checks for Unlifted things.
What to do about UnknownLevity in PrimRep? I think we should lower negative occurrences (parameters) to Lifted and positive occurrences (return values) to Unlifted, inspired by variance in subtyping: Given the subtyping relation Unlifted <: Lifted, Lifted -> Lifted -> Unlifted is a subtype of both Unlifted -> Unlifted -> Unlifted and Lifted -> Lifted -> Lifted. Alternatively, stick to levity polymorphism in STG and maybe change NewRule to
G, a:k |- ty : TYPE (BoxedRep l)a \in l (after expanding type synonyms)----------------------------------------------- NewRuleG |- forall (a:k). ty : TYPE (BoxedRep l) <--------- l instead of UnknownLevity
? I don't see why we need UnknownLevity. Ah, probably because our PrimRep-based type system builds on our regular one with arrows, forall and so on and we can't embed type variables in a PrimRep. Fair enough, then our best bet is the subtyping system, I suppose (hoping, but also doubting, that it is expressive enough).
When converting a BoxedRepRuntimeRep into a PrimRep, if it's a BoxedRep Unlifted, we can convert to BoxedRep UnliftedPL. Otherwise (in the Lifted and variable cases), we go to BoxedRep UnknownPL. I think that works, given that Unlifted is just an optimization -- that is, wanting Unlifted but getting Unknown doesn't cause a crash (right?).
We need UnknownLevity because a typing rule like the one you wrote has a major problem: the kind below the line mentions an out-of-scope l. I have no idea what such a rule means. (For example, it is utterly inexpressible with de Bruijn indices.) We need to put something there, and UnknownLevity is that something.
Why can't we do this with RuntimeReps in general? Because there's no conservative choice for how to represent something of unknown representation. Happily, and crucially for our cause here, there is a conservative choice of how to deal with unknown levity: just pretend the thing is lifted.
data PrimLevity = UnliftedPL | UnknownPL
Otherwise (in the Lifted and variable cases), we go to BoxedRep UnknownPL.
That should work, I think. But then you could also call UnknownPLLiftedPL instead... Although UnknownPL might indeed be less susceptible to misinterpretation.
We need UnknownLevity because a typing rule like the one you wrote has a major problem: the kind below the line mentions an out-of-scope l. I have no idea what such a rule means.
I was thinking that the l is a meta var denoting an arbitrary expression rather than a variable. An expression in which a occurs free (after type synonym expansion). So we don't need to bind l, we only need to bind every variable free in l. And that we do.
I think the actual problem is in the translation of that kind to PrimRep. You invented UnknownLevity in order not having to track forall-bound levity vars in the translation process (your rep function above). One could just as well maintain an environment TyVar -> PrimLevity for forall-bound levity vars in rep. (Since that environment will be const UnknownPL anyway, we may just as well maintain a VarSet.)
The advantage: No need for a strange UnknownLevity in Core. (Slight) Disadvantage, we need to adjust the
Key requirement for type erasure: We need rep(kind(forall a. blah)) to equal rep(kind(blah)).
to
Key requirement for type erasure: We need rep(env, kind(forall a. blah)) to equal rep(env, kind(blah)), unless a is a levity variable, then we need rep(env, kind(forall a. blah)) to equal rep(env[a->UnknownPL], kind(blah)).
and have an equation rep_lev(env, a) = env(a).
I retract my previous opinion on a subtyping system for STG (centered around Unlifted <: Lifted). It's not expressive enough to type useful higher-order programs such as levity-polymorphic app:
This should ultimately be a valid GHC Haskell program. At least I think we can express it in Core.
What (fixed) levity should a get when compiling to STG? Whether you pick Unlifted or Lifted, you won't be able to type-check both app calls. This program is fundamentally untypeable in STG_<: (e.g., STG with subtyping as above) unless we integrate levity type variables in the long-term.
I was thinking that the l is a meta var denoting an arbitrary expression rather than a variable. An expression in which a occurs free (after type synonym expansion). So we don't need to bind l, we only need to bind every variable free in l. And that we do.
My mistake: yes, l is an arbitrary expression. But if a is free in l, then a is not bound in the result: it's not in the G. Here is the relevant lemma:
Lemma (Regularity): If G |- ty : ki, then G |- ki : TYPE r and G |- r : RuntimeRep.
That lemma is not true when considering the rule you propose, but we definitely need it for type safety. Here is actually a simpler version of the key property:
Lemma (Regular Scoping): If G |- ty : ki, then fv(ty) \subset dom(G) and fv(ki) \subset dom(G).
The first lemma implies the second, but perhaps the second is easier to reason about (and often has to be proved before we can prove the first one, anyway).
Sorry for being slow, but can you maybe point out where my typing rule is broken? You said "But if a is free in l, then a is not bound in the result", but I'm not sure which result you mean. a can occur in both mentions of l and both mentions either bind it in G (in the LHS of |-) or in the forall in the RHS of |-.
In fact, my NewRule is like your NewRule, except that I say l where you said UnknownLevity. If the latter is OK, then why isn't the former? It binds a in the forall.
I don't want to put well-scopedness up for debate at all, I just don't see how the new rule violates it.
G, a:k |- ty : TYPE (BoxedRep l)a \in l (after expanding type synonyms)----------------------------------------------- SebastianRuleG |- forall (a:k). ty : TYPE (BoxedRep l)
We will evaluate this rule against my "Regular Scoping" lemma: If G |- ty : ki, then fv(ty) \subset dom(G) and fv(ki) \subset dom(G).
We assume G, a:k |- ty : TYPE (BoxedRep l) for some context G, variable a, and types ty, k, and l. Furthermore, we assume a \in l -- more formally, a \in fv(l).
By the well-formedness of G, a:k, we further assume a \notin dom(G). That is, the context G, a:k does not bind a twice.
The Regular Scoping lemma says that fv(TYPE (BoxedRep l)) \subset dom(G).
We know that a \in fv(TYPE (BoxedRep l)) because the definition of fv says that fv(TYPE (BoxedRep l)) = fv(l).
Thus, the Regular Scoping lemma says that a \in dom(G).
But this is a contradiction, because we assumed a \notin dom(G).
Maybe more simply: the problem is that, in the result, the forall scopes over the ty, but not the TYPE (BoxedRep l).