Consider CmmFloat - Code motion to move shared subexpressions into the common path.
Consider this motivating code snippet:
maybeFlipCond :: Cond -> Maybe Cond
maybeFlipCond cond = case cond of
EQQ -> Just EQQ
NE -> Just NE
...
LE -> Just GE
GE -> Just LE
_other -> Nothing
This results in the following Cmm code, with some unrelated parts removed:
c2uc: // global
I64[Sp - 8] = c2tU;
R1 = R2;
Sp = Sp - 8;
if (R1 & 7 != 0) goto c2tU; else goto c2tV;
c2tV: // global
call (I64[R1])(R1) returns to c2tU, args: 8, res: 8, upd: 8;
c2tU: // global
_c2u9::I64 = %MO_UU_Conv_W32_W64(I32[I64[R1 - 1] - 4]);
if (_c2u9::I64 >= 11) goto c2tY; else goto u2uK;
u2uK: // global
if (_c2u9::I64 < 1) goto c2tY; else goto u2uL;
c2tY: // global
R1 = Nothing_closure+1;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
u2uL: // global
switch [1 .. 10] _c2u9::I64 {
case 1 : goto c2tZ;
case 2 : goto c2u0;
case 3 : goto c2u1;
case 4 : goto c2u2;
case 5 : goto c2u3;
case 6 : goto c2u4;
case 7 : goto c2u5;
case 8 : goto c2u6;
case 9 : goto c2u7;
case 10 : goto c2u8;
}
c2u8: // global
R1 = maybeFlipCond1_closure+2;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c2u7: // global
R1 = maybeFlipCond2_closure+2;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
...: // global
repeat for maybeFlipCond [3..9] _closure
c2tZ: // global
R1 = maybeFlipCond10_closure+2;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
In this case there is no reason why the SP modification couldn't be pulled out into the common path.
In this case this would have the benefits of:
- Reducing code size
- It might improve latency and therefore performance between the SP modification and the indirect jump to SP (the call).
As a knock-on effect we could also transform the jumptable into a lookup table improving performance further. (See #17238)
Now the prime use case for this would be SP modifications as this pattern is somewhat common. But it could easily be generalized into a CmmFloat pass which does this with arbitrary shared expressions.