Generate branchless code for two-alternative cases with default alternative if they return literals or nullary constructors.
While reviewing a patch it came up that for these two ways to test a bitmap:
foo :: Int -> Bool
foo x = x < (1 `shiftL` 5)
bar :: Int -> Bool
bar x = (x `shiftR` 6) == 0
We generate different assembly.
At the heart of it foo becomes branchless:
c1xT: // global
R1 = I64[(%MO_S_Lt_W64(I64[R1 + 7],
32) << 3) + GHC.Types.Bool_closure_tbl];
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
while bar
get's compiled to branchy code:
c1yd: // global
if (%MO_S_Shr_W64(I64[R1 + 7], 6) != 0) goto c1yr; else goto c1yx;
c1yr: // global
R1 = GHC.Types.False_closure+1;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c1yx: // global
R1 = GHC.Types.True_closure+2;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
In general I would expect foo
to perform better here.
Going a level up in STG we see this difference:
A.foo :: GHC.Types.Int -> GHC.Types.Bool
\r [x_s1xC]
case x_s1xC of {
GHC.Types.I# x1_s1xE [Occ=Once] ->
case <# [x1_s1xE 32#] of sat_s1xF [Occ=Once] {
__DEFAULT -> tagToEnum# [sat_s1xF];
};
};
A.bar :: GHC.Types.Int -> GHC.Types.Bool
\r [x_s1xG]
case x_s1xG of {
GHC.Types.I# ww1_s1xI [Occ=Once] ->
case uncheckedIShiftRA# [ww1_s1xI 6#] of {
__DEFAULT -> GHC.Types.False [];
0# -> GHC.Types.True [];
};
};
I think for this specific case it would be reasonable to try and catch the more general pattern of:
case e of {
DEFAULT -> GHC.Types.False/True [];
<constant> -> GHC.Types.True/False [];
};
This could then either be translated to straight line code directly during the STG->Cmm translation or with the help of some kind of primop. doesn't even need to be a numeric literal, so something like this:
case $ccompare1_r3gM x_s3s6 y_s3s7 of {
__DEFAULT -> GHC.Types.True [];
GHC.Types.LT -> GHC.Types.False [];
};
Could also be captured.
The code is fairly simple (in Cmm). We compare the case binder either directly (for literals) or by tag (otherwise). We can then use setCC and the same mechanic as tagToEnum# to turn this into a Boolean value.
But we can still go futher. We don't have to restrict us to True/False on the RHS either. Literals and other data types with tags could be looked at at the same time. For nullary constructors and literals we could then generate straight line code via cmove instead of setCC.
The only thing I'm unsure is what the cleanest way to implement this would be. This could be expressed well as a cmov-like primop.
-- Compare the first two arguments (by tag, or as literals). If they are equal return the first argument, if not the second.
cselect# :: a -> a -> b -> b -> b
Which for literals as selectors could guarantee that it results in branchless code. So we would simply translate the original STG into:
case uncheckedIShiftRA# [ww1_s1xI 6#] of r {
__DEFAULT -> cselect# r 0# True False
}
The only wrinkle is that it's hard to guarantee that we stay branchless if the arguments to cselect# are lifted. If we can't prove they are already tagged we have to emit code to get their tag. But maybe that wouldn't be a big issue.
I don't think anything would prevent us from doing this at the Core level either. But I can imagine that cselect# might inhibit other optimizations so as an optimization we might only want to introduce it at the STG stage.
Sady I don't see myself implementing any of this anytime soon.