unnecessary indirect jump when returning a case scrutinee
I happened to be looking at the Cmm for this code (ghc 7.8.3, -O2)
f :: Int -> Int f x = if x < 0 then x else x+1
and I noticed something a bit funny about it:
c12e: if ((Sp + -8) < SpLim) goto c12z; else goto c12A; c12z: R2 = R2; R1 = Test.f_closure; call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8; c12A: I64[Sp - 8] = c12b; R1 = R2; Sp = Sp - 8; if (R1 & 7 != 0) goto c12b; else goto c12c; c12c: call (I64[R1])(R1) returns to c12b, args: 8, res: 8, upd: 8; c12b: Hp = Hp + 16; if (Hp > HpLim) goto c12y; else goto c12x; c12y: HpAlloc = 16; R1 = R1; call stg_gc_unpt_r1(R1) returns to c12b, args: 8, res: 8, upd: 8; c12x: _s11Q::I64 = I64[R1 + 7]; if (%MO_S_Lt_W64(_s11Q::I64, 0)) goto c12u; else goto c12v; c12u: Hp = Hp - 16; R1 = R1 & (-8); /* <--- */ Sp = Sp + 8; call (I64[R1])(R1) args: 8, res: 0, upd: 8; /* <--- */ c12v: I64[Hp - 8] = GHC.Types.I#_con_info; I64[Hp] = _s11Q::I64 + 1; R1 = Hp - 7; Sp = Sp + 8; call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
On the two marked lines, we untag R1 (which is
x) and enter it. However, we know at this point that
x is already in WHNF so we could simply return it by replacing the two lines with
call (P64[Sp])(R1), if I'm not mistaken. That will save a load and an indirect jump (which we actually know is to
I#_con_info, which would just retag R1 and return to the address on the stack anyways).
I think the same optimization should be available any time we do an algebraic
case and in a branch simply return the scrutinee.
I looked at what it would take to fix this. It looks almost easy: if we add a new
LFUnknownCon meaning that we know the identifier is bound to a saturated application of an unknown constructor, then we could set the
cg_lf of the case binder variable of an algebraic case statement to
LFUnknownCon, and return
LFUnknownCon variables in
getCallMethod. I think that would do it. Does that sound right? Is there a better way?
(In my original example we actually know the constructor has to be
I#. But if the case was on a type with more than one constructor we wouldn't know statically which one we got, just that it has to be one of them.)