Field selector types bungle the order (and specificites) of type variables

A discussion that began in !2465 (comment 263633) revealed that the types which GHC chooses for field selectors are:

Rather counterintuitive, and
Not specified anywhere in the GHC User's Guide (at least, not that I can tell)

I will clarify each of the bullet points in order.

As far as (1) goes, here is a pop quiz: what is the type of unN below?

newtype N :: Type -> Type -> Type where
  MkN :: forall b a. { unN :: Either a b } -> N a b

To my surprise, I discovered that this is the answer:

λ> :type +v unN
unN :: forall a b. N a b -> Either a b

Notice the forall a b. part, despite the fact that these type variables are quantified in the opposite order in the constructor itself:

λ> :type +v MkN
MkN :: forall b a. Either a b -> N a b

I would have expected both the constructor and the field selector to use forall b a., since that is the order in which the user declared them originally. But that's not even the only weird thing going on. Here is another example to ponder:

{-# LANGUAGE PolyKinds #-}

import Data.Proxy

newtype P a = MkP { unP :: Proxy a }

The use of PolyKinds here is important, since that causes GHC to infer that a is kind-polymorphic. For example, here is the type of MkP:

λ> :type +v MkN
MkP :: forall {k} (a :: k). Proxy a -> P a

What, then, should be the type of unP? It turns out that this answer also departs from my expectations:

λ> :type +v unP
unP :: forall k (a :: k). P a -> Proxy a

Unlike the type of MkP, the type of unP declares the kind variable k to be specified, not inferred. One consequence of this discrepancy is that MkP @True will typecheck, but unP @True won't!

The culprit is this part of TcTyDecls.mkOneRecordSelector:

    -- Selector type; Note [Polymorphic selectors]
    field_ty   = conLikeFieldType con1 lbl
    data_tvs   = tyCoVarsOfTypesWellScoped inst_tys
    data_tv_set= mkVarSet data_tvs
    is_naughty = not (tyCoVarsOfType field_ty `subVarSet` data_tv_set)
    (field_tvs, field_theta, field_tau) = tcSplitSigmaTy field_ty
    sel_ty | is_naughty = unitTy  -- See Note [Naughty record selectors]
           | otherwise  = mkSpecForAllTys data_tvs          $
                          mkPhiTy (conLikeStupidTheta con1) $   -- Urgh!
                          mkVisFunTy data_ty                $
                          mkSpecForAllTys field_tvs         $
                          mkPhiTy field_theta               $
                          -- req_theta is empty for normal DataCon
                          mkPhiTy req_theta                 $
                          field_tau

There are two things wrong with those code, and both of them involve the use of mkSpecForAllTys:

The first use, mkSpecForAllTys data_tvs, simply computes the free variables of the return type in a data constructor and quantifies them in that order, all specified. This completely ignores the original order (and specificities) that the user declared when defining the data constructor. This is why the P and N examples above behave strangely.
The second use, mkSpecForAllTys field_tvs, takes the foralls in the field type and forces them to be specified, regardless of the declared specificities. This isn't that big of a problem today, but it will be in a future where explicit specificity is implemented. For example, if you had this type:
```
newtype T = MkT { unT :: forall {a}. a -> a }
```
Then GHC would pick the type T -> forall a. a -> a for unT, not T -> forall {a}. a -> a.

Fixing this bit of code will nail part (1). I have a quick-and-dirty patch from !2465 (comment 263718) that mostly does the job:

diff --git a/compiler/typecheck/TcTyDecls.hs b/compiler/typecheck/TcTyDecls.hs
index cff4f3ed00..60fb8405ba 100644
--- a/compiler/typecheck/TcTyDecls.hs
+++ b/compiler/typecheck/TcTyDecls.hs
@@ -48,6 +48,7 @@ import GHC.Driver.Types
 import GHC.Core.TyCon
 import GHC.Core.ConLike
 import GHC.Core.DataCon
+import GHC.Core.PatSyn
 import GHC.Types.Name
 import GHC.Types.Name.Env
 import GHC.Types.Name.Set hiding (unitFV)
@@ -873,15 +874,20 @@ mkOneRecordSelector all_cons idDetails fl
 
     -- Selector type; Note [Polymorphic selectors]
     field_ty   = conLikeFieldType con1 lbl
-    data_tvs   = tyCoVarsOfTypesWellScoped inst_tys
-    data_tv_set= mkVarSet data_tvs
+    data_tv_set= tyCoVarsOfTypes inst_tys
+    data_tvbs' = case con1 of
+                   RealDataCon dc -> dataConUserTyVarBinders dc
+                   PatSynCon ps   -> patSynUnivTyVarBinders ps ++
+                                     patSynExTyVarBinders ps
+    data_tvbs  = filter (\tvb -> binderVar tvb `elemVarSet` data_tv_set) data_tvbs'
     is_naughty = not (tyCoVarsOfType field_ty `subVarSet` data_tv_set)
-    (field_tvs, field_theta, field_tau) = tcSplitSigmaTy field_ty
+    (field_tvbs, field_rho)  = tcSplitForAllVarBndrs field_ty
+    (field_theta, field_tau) = tcSplitPhiTy field_rho
     sel_ty | is_naughty = unitTy  -- See Note [Naughty record selectors]
-           | otherwise  = mkSpecForAllTys data_tvs          $
+           | otherwise  = mkForAllTys data_tvbs             $
                           mkPhiTy (conLikeStupidTheta con1) $   -- Urgh!
                           mkVisFunTy data_ty                $
-                          mkSpecForAllTys field_tvs         $
+                          mkForAllTys field_tvbs            $
                           mkPhiTy field_theta               $
                           -- req_theta is empty for normal DataCon
                           mkPhiTy req_theta                 $

@simonpj offers some further suggestions in !2465 (comment 263883):

Let's not decompose field_ty at all. There is no need to do so.

Ryan's solution about using dataConUserTyVarBinders is good. But define a new function ConLike.conLikeUserTyVarBinders and use it here and dsExpr.

Add the invariant in DataCon that only dataConUserTyVarBinders appear in the argument types; the variables in the domain of dcEqSpec to not. Hencece, here no need to apply eq_subst to field_ty.

As far as part (2) goes, I think the best approach would be to add a new section to the GHC User's Manual specifically about the types of record selectors. There are some brief mentions of how record selectors interact with GADTs here:

-  As mentioned in :ref:`gadt-style`, record syntax is supported. For
   example:

   ::

         data Term a where
             Lit    :: { val  :: Int }      -> Term Int
             Succ   :: { num  :: Term Int } -> Term Int
             Pred   :: { num  :: Term Int } -> Term Int
             IsZero :: { arg  :: Term Int } -> Term Bool
             Pair   :: { arg1 :: Term a
                       , arg2 :: Term b
                       }                    -> Term (a,b)
             If     :: { cnd  :: Term Bool
                       , tru  :: Term a
                       , fls  :: Term a
                       }                    -> Term a

   However, for GADTs there is the following additional constraint:
   every constructor that has a field ``f`` must have the same result
   type (modulo alpha conversion) Hence, in the above example, we cannot
   merge the ``num`` and ``arg`` fields above into a single name.
   Although their field types are both ``Term Int``, their selector
   functions actually have different types:

   ::

         num :: Term Int -> Term Int
         arg :: Term Bool -> Term Int

And here:

-  As in the case of existentials declared using the Haskell-98-like
   record syntax (:ref:`existential-records`), record-selector functions
   are generated only for those fields that have well-typed selectors.
   Here is the example of that section, in GADT-style syntax: ::

       data Counter a where
           NewCounter :: { _this    :: self
                         , _inc     :: self -> self
                         , _display :: self -> IO ()
                         , tag      :: a
                         } -> Counter a

   As before, only one selector function is generated here, that for
   ``tag``. Nevertheless, you can still use all the field names in
   pattern matching and record construction.

But these sections don't really paint a complete picture of how the types of record selectors work. Better to consolidate them into a single section that tells the whole story, including the order of type variables and specificity.

Edited Apr 06, 2020 by Ryan Scott

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information