Field selector types bungle the order (and specificites) of type variables
A discussion that began in !2465 (comment 263633) revealed that the types which GHC chooses for field selectors are:
- Rather counterintuitive, and
- Not specified anywhere in the GHC User's Guide (at least, not that I can tell)
I will clarify each of the bullet points in order.
As far as (1) goes, here is a pop quiz: what is the type of unN
below?
newtype N :: Type -> Type -> Type where
MkN :: forall b a. { unN :: Either a b } -> N a b
To my surprise, I discovered that this is the answer:
λ> :type +v unN
unN :: forall a b. N a b -> Either a b
Notice the forall a b.
part, despite the fact that these type variables are quantified in the opposite order in the constructor itself:
λ> :type +v MkN
MkN :: forall b a. Either a b -> N a b
I would have expected both the constructor and the field selector to use forall b a.
, since that is the order in which the user declared them originally. But that's not even the only weird thing going on. Here is another example to ponder:
{-# LANGUAGE PolyKinds #-}
import Data.Proxy
newtype P a = MkP { unP :: Proxy a }
The use of PolyKinds
here is important, since that causes GHC to infer that a
is kind-polymorphic. For example, here is the type of MkP
:
λ> :type +v MkN
MkP :: forall {k} (a :: k). Proxy a -> P a
What, then, should be the type of unP
? It turns out that this answer also departs from my expectations:
λ> :type +v unP
unP :: forall k (a :: k). P a -> Proxy a
Unlike the type of MkP
, the type of unP
declares the kind variable k
to be specified, not inferred. One consequence of this discrepancy is that MkP @True
will typecheck, but unP @True
won't!
The culprit is this part of TcTyDecls.mkOneRecordSelector
:
-- Selector type; Note [Polymorphic selectors]
field_ty = conLikeFieldType con1 lbl
data_tvs = tyCoVarsOfTypesWellScoped inst_tys
data_tv_set= mkVarSet data_tvs
is_naughty = not (tyCoVarsOfType field_ty `subVarSet` data_tv_set)
(field_tvs, field_theta, field_tau) = tcSplitSigmaTy field_ty
sel_ty | is_naughty = unitTy -- See Note [Naughty record selectors]
| otherwise = mkSpecForAllTys data_tvs $
mkPhiTy (conLikeStupidTheta con1) $ -- Urgh!
mkVisFunTy data_ty $
mkSpecForAllTys field_tvs $
mkPhiTy field_theta $
-- req_theta is empty for normal DataCon
mkPhiTy req_theta $
field_tau
There are two things wrong with those code, and both of them involve the use of mkSpecForAllTys
:
-
The first use,
mkSpecForAllTys data_tvs
, simply computes the free variables of the return type in a data constructor and quantifies them in that order, all specified. This completely ignores the original order (and specificities) that the user declared when defining the data constructor. This is why theP
andN
examples above behave strangely. -
The second use,
mkSpecForAllTys field_tvs
, takes theforall
s in the field type and forces them to be specified, regardless of the declared specificities. This isn't that big of a problem today, but it will be in a future where explicit specificity is implemented. For example, if you had this type:newtype T = MkT { unT :: forall {a}. a -> a }
Then GHC would pick the type
T -> forall a. a -> a
forunT
, notT -> forall {a}. a -> a
.
Fixing this bit of code will nail part (1). I have a quick-and-dirty patch from !2465 (comment 263718) that mostly does the job:
diff --git a/compiler/typecheck/TcTyDecls.hs b/compiler/typecheck/TcTyDecls.hs
index cff4f3ed00..60fb8405ba 100644
--- a/compiler/typecheck/TcTyDecls.hs
+++ b/compiler/typecheck/TcTyDecls.hs
@@ -48,6 +48,7 @@ import GHC.Driver.Types
import GHC.Core.TyCon
import GHC.Core.ConLike
import GHC.Core.DataCon
+import GHC.Core.PatSyn
import GHC.Types.Name
import GHC.Types.Name.Env
import GHC.Types.Name.Set hiding (unitFV)
@@ -873,15 +874,20 @@ mkOneRecordSelector all_cons idDetails fl
-- Selector type; Note [Polymorphic selectors]
field_ty = conLikeFieldType con1 lbl
- data_tvs = tyCoVarsOfTypesWellScoped inst_tys
- data_tv_set= mkVarSet data_tvs
+ data_tv_set= tyCoVarsOfTypes inst_tys
+ data_tvbs' = case con1 of
+ RealDataCon dc -> dataConUserTyVarBinders dc
+ PatSynCon ps -> patSynUnivTyVarBinders ps ++
+ patSynExTyVarBinders ps
+ data_tvbs = filter (\tvb -> binderVar tvb `elemVarSet` data_tv_set) data_tvbs'
is_naughty = not (tyCoVarsOfType field_ty `subVarSet` data_tv_set)
- (field_tvs, field_theta, field_tau) = tcSplitSigmaTy field_ty
+ (field_tvbs, field_rho) = tcSplitForAllVarBndrs field_ty
+ (field_theta, field_tau) = tcSplitPhiTy field_rho
sel_ty | is_naughty = unitTy -- See Note [Naughty record selectors]
- | otherwise = mkSpecForAllTys data_tvs $
+ | otherwise = mkForAllTys data_tvbs $
mkPhiTy (conLikeStupidTheta con1) $ -- Urgh!
mkVisFunTy data_ty $
- mkSpecForAllTys field_tvs $
+ mkForAllTys field_tvbs $
mkPhiTy field_theta $
-- req_theta is empty for normal DataCon
mkPhiTy req_theta $
@simonpj offers some further suggestions in !2465 (comment 263883):
- Let's not decompose
field_ty
at all. There is no need to do so.- Ryan's solution about using
dataConUserTyVarBinders
is good. But define a new functionConLike.conLikeUserTyVarBinders
and use it here anddsExpr
.- Add the invariant in
DataCon
that onlydataConUserTyVarBinders
appear in the argument types; the variables in the domain ofdcEqSpec
to not. Hencece, here no need to applyeq_subst
to field_ty.
As far as part (2) goes, I think the best approach would be to add a new section to the GHC User's Manual specifically about the types of record selectors. There are some brief mentions of how record selectors interact with GADTs here:
- As mentioned in :ref:`gadt-style`, record syntax is supported. For
example:
::
data Term a where
Lit :: { val :: Int } -> Term Int
Succ :: { num :: Term Int } -> Term Int
Pred :: { num :: Term Int } -> Term Int
IsZero :: { arg :: Term Int } -> Term Bool
Pair :: { arg1 :: Term a
, arg2 :: Term b
} -> Term (a,b)
If :: { cnd :: Term Bool
, tru :: Term a
, fls :: Term a
} -> Term a
However, for GADTs there is the following additional constraint:
every constructor that has a field ``f`` must have the same result
type (modulo alpha conversion) Hence, in the above example, we cannot
merge the ``num`` and ``arg`` fields above into a single name.
Although their field types are both ``Term Int``, their selector
functions actually have different types:
::
num :: Term Int -> Term Int
arg :: Term Bool -> Term Int
And here:
- As in the case of existentials declared using the Haskell-98-like
record syntax (:ref:`existential-records`), record-selector functions
are generated only for those fields that have well-typed selectors.
Here is the example of that section, in GADT-style syntax: ::
data Counter a where
NewCounter :: { _this :: self
, _inc :: self -> self
, _display :: self -> IO ()
, tag :: a
} -> Counter a
As before, only one selector function is generated here, that for
``tag``. Nevertheless, you can still use all the field names in
pattern matching and record construction.
But these sections don't really paint a complete picture of how the types of record selectors work. Better to consolidate them into a single section that tells the whole story, including the order of type variables and specificity.