Code for derived Eq instance produces surprisingly bad assembly.
Motivation
We have this simple enum type:
data Format
= II8
| II16
| II32
| II64
| FF32
| FF64
deriving (Eq)
We get this reasonable looking STG code:
$fEqFormat_$c== :: Format -> Format -> Bool
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=<S,1*U><S,1*U>,
Unf=OtherCon []] =
{} \r [ds_s57A ds1_s57B]
case ds_s57A of {
II8 ->
case ds1_s57B of {
__DEFAULT -> False [];
II8 -> True [];
};
-- More of the same ...
FF64 ->
case ds1_s57B of {
__DEFAULT -> False [];
FF64 -> True [];
};
};
So far so good, but if we look at the actual Cmm code generated things are a lot uglier.
Click to expand Cmm code
{offset
c58I: // global
I64[Sp - 16] = c58v;
R1 = R2;
P64[Sp - 8] = R3;
Sp = Sp - 16;
if (R1 & 7 != 0) goto c58v; else goto c58w;
c58w: // global
call (I64[R1])(R1) returns to c58v, args: 8, res: 8, upd: 8;
c58v: // global
_s57B::P64 = P64[Sp + 8];
switch [1 .. 6] (R1 & 7) {
case 1 : goto c58z;
case 2 : goto c58A;
case 3 : goto c58B;
case 4 : goto c58C;
case 5 : goto c58D;
case 6 : goto c58E;
}
c58E: // global
I64[Sp + 8] = c59Y;
R1 = _s57B::P64;
Sp = Sp + 8;
if (R1 & 7 != 0) goto c59Y; else goto c5a0;
c5a0: // global
call (I64[R1])(R1) returns to c59Y, args: 8, res: 8, upd: 8;
c59Y: // global
if (R1 & 7 != 6) goto c58T; else goto c58X;
c58D: // global
I64[Sp + 8] = c59J;
R1 = _s57B::P64;
Sp = Sp + 8;
if (R1 & 7 != 0) goto c59J; else goto c59L;
c59L: // global
call (I64[R1])(R1) returns to c59J, args: 8, res: 8, upd: 8;
c59J: // global
if (R1 & 7 != 5) goto c58T; else goto c58X;
c58C: // global
I64[Sp + 8] = c59u;
R1 = _s57B::P64;
Sp = Sp + 8;
if (R1 & 7 != 0) goto c59u; else goto c59w;
c59w: // global
call (I64[R1])(R1) returns to c59u, args: 8, res: 8, upd: 8;
c59u: // global
if (R1 & 7 != 4) goto c58T; else goto c58X;
c58B: // global
I64[Sp + 8] = c59f;
R1 = _s57B::P64;
Sp = Sp + 8;
if (R1 & 7 != 0) goto c59f; else goto c59h;
c59h: // global
call (I64[R1])(R1) returns to c59f, args: 8, res: 8, upd: 8;
c59f: // global
if (R1 & 7 != 3) goto c58T; else goto c58X;
c58A: // global
I64[Sp + 8] = c590;
R1 = _s57B::P64;
Sp = Sp + 8;
if (R1 & 7 != 0) goto c590; else goto c592;
c592: // global
call (I64[R1])(R1) returns to c590, args: 8, res: 8, upd: 8;
c590: // global
if (R1 & 7 != 2) goto c58T; else goto c58X;
c58z: // global
I64[Sp + 8] = c58L;
R1 = _s57B::P64;
Sp = Sp + 8;
if (R1 & 7 != 0) goto c58L; else goto c58N;
c58N: // global
call (I64[R1])(R1) returns to c58L, args: 8, res: 8, upd: 8;
c58L: // global
if (R1 & 7 != 1) goto c58T; else goto c58X;
c58T: // global
R1 = False_closure+1;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c58X: // global
R1 = True_closure+2;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
So what's the issue? In the Cmm code we get:
- Code to evaluate the first argument
- Branches for each possible constructor
- Inside each branch:
- Code to evaluate the second argument
- A comparison between the tag of the second argument and the one associated with the branch
- A branch to true/false based on the result.
We could just use dataToTag#, and I think we should.
However TcGenDeriv says:
* If there are a lot of (more than ten) nullary constructors, we emit a
catch-all clause of the form:
(==) a b = case (con2tag_Foo a) of { a# ->
case (con2tag_Foo b) of { b# ->
case (a# ==# b#) of {
r -> r }}}
If con2tag gets inlined this leads to join point stuff, so
it's better to use regular pattern matching if there aren't too
many nullary constructors. "Ten" is arbitrary, of course
Either way what we really want is code of some form that:
- Evaluates the first argument, gets tag.
- Evaluates the second argument, gets tag.
- Compare tags and branch on the result.
I fail to see why "join point stuff" is an issue in that regard.
Proposal:
I think we should just use dataToTag# in the derived instance.