Code for derived Eq instance produces surprisingly bad assembly.

Motivation

We have this simple enum type:

data Format
        = II8
        | II16
        | II32
        | II64
        | FF32
        | FF64
        deriving (Eq)

We get this reasonable looking STG code:

$fEqFormat_$c== :: Format -> Format -> Bool
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=<S,1*U><S,1*U>,
 Unf=OtherCon []] =
    {} \r [ds_s57A ds1_s57B]
        case ds_s57A of {
          II8 ->
              case ds1_s57B of {
                __DEFAULT -> False [];
                II8 -> True [];
              };
          -- More of the same ...
          FF64 ->
              case ds1_s57B of {
                __DEFAULT -> False [];
                FF64 -> True [];
              };
        };

So far so good, but if we look at the actual Cmm code generated things are a lot uglier.

Click to expand Cmm code

     {offset
       c58I: // global
           I64[Sp - 16] = c58v;
           R1 = R2;
           P64[Sp - 8] = R3;
           Sp = Sp - 16;
           if (R1 & 7 != 0) goto c58v; else goto c58w;
       c58w: // global
           call (I64[R1])(R1) returns to c58v, args: 8, res: 8, upd: 8;
       c58v: // global
           _s57B::P64 = P64[Sp + 8];
           switch [1 .. 6] (R1 & 7) {
               case 1 : goto c58z;
               case 2 : goto c58A;
               case 3 : goto c58B;
               case 4 : goto c58C;
               case 5 : goto c58D;
               case 6 : goto c58E;
           }
       c58E: // global
           I64[Sp + 8] = c59Y;
           R1 = _s57B::P64;
           Sp = Sp + 8;
           if (R1 & 7 != 0) goto c59Y; else goto c5a0;
       c5a0: // global
           call (I64[R1])(R1) returns to c59Y, args: 8, res: 8, upd: 8;
       c59Y: // global
           if (R1 & 7 != 6) goto c58T; else goto c58X;
       c58D: // global
           I64[Sp + 8] = c59J;
           R1 = _s57B::P64;
           Sp = Sp + 8;
           if (R1 & 7 != 0) goto c59J; else goto c59L;
       c59L: // global
           call (I64[R1])(R1) returns to c59J, args: 8, res: 8, upd: 8;
       c59J: // global
           if (R1 & 7 != 5) goto c58T; else goto c58X;
       c58C: // global
           I64[Sp + 8] = c59u;
           R1 = _s57B::P64;
           Sp = Sp + 8;
           if (R1 & 7 != 0) goto c59u; else goto c59w;
       c59w: // global
           call (I64[R1])(R1) returns to c59u, args: 8, res: 8, upd: 8;
       c59u: // global
           if (R1 & 7 != 4) goto c58T; else goto c58X;
       c58B: // global
           I64[Sp + 8] = c59f;
           R1 = _s57B::P64;
           Sp = Sp + 8;
           if (R1 & 7 != 0) goto c59f; else goto c59h;
       c59h: // global
           call (I64[R1])(R1) returns to c59f, args: 8, res: 8, upd: 8;
       c59f: // global
           if (R1 & 7 != 3) goto c58T; else goto c58X;
       c58A: // global
           I64[Sp + 8] = c590;
           R1 = _s57B::P64;
           Sp = Sp + 8;
           if (R1 & 7 != 0) goto c590; else goto c592;
       c592: // global
           call (I64[R1])(R1) returns to c590, args: 8, res: 8, upd: 8;
       c590: // global
           if (R1 & 7 != 2) goto c58T; else goto c58X;
       c58z: // global
           I64[Sp + 8] = c58L;
           R1 = _s57B::P64;
           Sp = Sp + 8;
           if (R1 & 7 != 0) goto c58L; else goto c58N;
       c58N: // global
           call (I64[R1])(R1) returns to c58L, args: 8, res: 8, upd: 8;
       c58L: // global
           if (R1 & 7 != 1) goto c58T; else goto c58X;
       c58T: // global
           R1 = False_closure+1;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
       c58X: // global
           R1 = True_closure+2;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
     }

So what's the issue? In the Cmm code we get:

Code to evaluate the first argument
Branches for each possible constructor
Inside each branch:
- Code to evaluate the second argument
- A comparison between the tag of the second argument and the one associated with the branch
- A branch to true/false based on the result.

We could just use dataToTag#, and I think we should.

However TcGenDeriv says:

* If there are a lot of (more than ten) nullary constructors, we emit a
  catch-all clause of the form:

      (==) a b  = case (con2tag_Foo a) of { a# ->
                  case (con2tag_Foo b) of { b# ->
                  case (a# ==# b#)     of {
                    r -> r }}}

  If con2tag gets inlined this leads to join point stuff, so
  it's better to use regular pattern matching if there aren't too
  many nullary constructors.  "Ten" is arbitrary, of course

Either way what we really want is code of some form that:

Evaluates the first argument, gets tag.
Evaluates the second argument, gets tag.
Compare tags and branch on the result.

I fail to see why "join point stuff" is an issue in that regard.

Proposal:

I think we should just use dataToTag# in the derived instance.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information