Perhaps harmful interaction between the "ClassOp" rules and specialization.

While working on !4764 I compiled some code with -Wall-missed-specializations.

The core problem

There are a few failed specialisations for known issues like elem not specializing (see #18285, #19587).

But there were also some odd ones. In particular we for example we fail to specialize methods of the Instruction typeclass:

compiler\GHC\CmmToAsm\X86.hs: warning: [-Wall-missed-specialisations]
    Could not specialise imported function `isJumpishInstr'
      when specialising `GHC.CmmToAsm.Reg.Linear.genRaInsn'
      when specialising `GHC.CmmToAsm.Reg.Linear.$wraInsn'
      when specialising `GHC.CmmToAsm.Reg.Linear.raInsn'
      when specialising `GHC.CmmToAsm.Reg.Linear.linearRA'
      when specialising `GHC.CmmToAsm.Reg.Linear.processBlock'
      when specialising `GHC.CmmToAsm.Reg.Linear.process'
      when specialising `GHC.CmmToAsm.Reg.Linear.linearRA_SCCs'
      when specialising GHC.CmmToAsm.Reg.Linear.$wlinearRegAlloc'
      when specialising GHC.CmmToAsm.Reg.Linear.linearRegAlloc'
      when specialising `GHC.CmmToAsm.Reg.Linear.linearRegAlloc'
    Probable fix: add INLINABLE pragma on `isJumpishInstr'

It's defined like this:

-- | Instruction instance for x86 instruction set.
instance Instruction X86.Instr where
   regUsageOfInstr         = X86.regUsageOfInstr
   patchRegsOfInstr        = X86.patchRegsOfInstr
   isJumpishInstr          = X86.isJumpishInstr
   ...

Adding INLINEABLE to both X86.isJumpishInstr as well as the instance method didn't change anything.

Looking at the output of specialiser and for uses of the function. the result is quite surprising. The functions are actually used on a concrete types:

case isJumpishInstr @X86.Instr $dInstruction_s7f8 eta3_a78C of { ...

Going further however it turned out that it isn't actually a type class dictionary we apply, rather $dInstruction_s7f8 is a thunk at this point:

-- RHS size: {terms: 2, types: 6, coercions: 0, joins: 0/0}
$dInstruction_s7f8 :: Instruction X86.Instr
[LclId]
$dInstruction_s7f8
  = ghc-prim:GHC.Classes.$p2(%,,%)
      @(GHC.CmmToAsm.Reg.Linear.FreeRegs.FR
          GHC.CmmToAsm.Reg.Linear.X86.FreeRegs)
      @(GHC.Utils.Outputable.Outputable
          GHC.CmmToAsm.Reg.Linear.X86.FreeRegs)
      @(Instruction X86.Instr)
      w_s6Fa

The method it applies (constraint-tuple selector) is defined like this:

-- RHS size: {terms: 8, types: 14, coercions: 0, joins: 0/0}
GHC.Classes.$p2(%,,%)
  :: forall (a :: Constraint) (b :: Constraint) (c :: Constraint).
     (a, b, c) =>
     c
[GblId[ClassOp],
 Arity=1,
 Caf=NoCafRefs,
 Str=<SP(A,A,SL)>,
 RULES: Built in rule for GHC.Classes.$p2(%,,%): "Class op $p2(%,,%)"]
GHC.Classes.$p2(%,,%)
  = \ (@(a_11 :: Constraint))
      (@(b_12 :: Constraint))
      (@(c_13 :: Constraint))
      (v_B1 :: (a_11, b_12, c_13)) ->
      case v_B1 of v_B1 { (v_B2, v_B3, v_B4) -> v_B4 }

Which itself fails to specialize: Could not specialise imported function ghc-prim:GHC.Classes.$p2(%,,%) This is somewhat obvious as it doesn't have a unfolding.

The reasoning for that is explained in Note [ClassOp/DFun selection].
The gist of it is that we never inline ClassOp methods. Instead we generate rules for them which will evaluate them if they are applied to a dictionary (DFun id). What we apply $p2(%,,%) to is w_s6Fa which is defined as:

-- RHS size: {terms: 4, types: 6, coercions: 10, joins: 0/0}
w_s6Fa
  :: GHC.CmmToAsm.Reg.Linear.OutputableRegConstraint
       GHC.CmmToAsm.Reg.Linear.X86.FreeRegs X86.Instr
[LclId]
w_s6Fa
  = (GHC.CmmToAsm.Reg.Linear.FreeRegs.$fFRFreeRegs2,
     GHC.Utils.Outputable.$fOutputableWord3
     `cast` (Sym (GHC.CmmToAsm.Reg.Linear.X86.N:FreeRegs[0])
             %<'Many>_N ->_R Sym (GHC.Utils.Outputable.N:SDoc[0])
             ; Sym (GHC.Utils.Outputable.N:Outputable[0]
                        <GHC.CmmToAsm.Reg.Linear.X86.FreeRegs>_N)
             :: (GHC.Word.Word32
                 -> GHC.Utils.Outputable.SDocContext -> GHC.Utils.Ppr.Doc)
                ~R# GHC.Utils.Outputable.Outputable
                      GHC.CmmToAsm.Reg.Linear.X86.FreeRegs),
     GHC.CmmToAsm.X86.$fInstructionInstr)

Which comes from the solving of this constraint:

type OutputableRegConstraint freeRegs instr =
        (FR freeRegs, Outputable freeRegs, Instruction instr)

Now in the final Core for this case it doesn't seem to affect the outcome. The functions either appear in their monomorphic variants or disappear (because of inlining) in the final code.

I assume this is because of inlining/rules and simplifications down the line.

Open Questions

Why isn't the application of ghc-prim:GHC.Classes.$p2(%,,%) not optimized away.

The first thing I wonder about is why ghc-prim:GHC.Classes.$p2(%,,%) <types> w_s6Fa isn't evaluated by the gentle simplifier pass. It's run before specialisation and should catch this case I believe.

Based on Note [ClassOp/DFun selection] we "give the ClassOp ... a BuiltinRule that extracts the right piece iff its argument satisfies exprIsConApp_maybe." w_s6Fa certainly looks like a ConApp to me!

The other thing is how much this matters. Yes we get warnings about "foo" failing to specialize, but it actually does through simplification in the end. It's possible to verify this by looking at the core.
But the same pattern might not always be solved by simplification for other programs.

Edited Mar 25, 2021 by Andreas Klebinger

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information