Data.Bits.testBit should use tagToEnum#
Motivation
One would except these two definitions to have similar performance.
twiddleA, twiddleB :: Word8-> Bool
twiddleA w = w < 0x80
twiddleB w = not (testBit w 7)
They don't look that different at the STG level:
M.twiddleA :: GHC.Word.Word8 -> GHC.Types.Bool
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U(U)>, Unf=OtherCon []] =
\r [w_s1Dd]
case w_s1Dd of {
GHC.Word.W8# x_s1Df [Occ=Once] ->
case ltWord# [x_s1Df 128##] of sat_s1Dg [Occ=Once] {
__DEFAULT -> tagToEnum# [sat_s1Dg];
};
};
M.twiddleB :: GHC.Word.Word8 -> GHC.Types.Bool
[GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U(U)>, Unf=OtherCon []] =
\r [w_s1Dh]
case w_s1Dh of {
GHC.Word.W8# x#_s1Dj [Occ=Once] ->
case and# [x#_s1Dj 128##] of {
__DEFAULT -> GHC.Types.False [];
0## -> GHC.Types.True [];
};
};
However in Cmm their core (ignoring unboxing overhead and some details) is:
//Variant A:
R1 = I64[((I64[R1 + 7] < 128) << 3) + GHC.Types.Bool_closure_tbl];
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
//Variant B:
c1DJ: // global
if (I64[R1 + 7] & 128 != 0) goto c1DX; else goto c1E3;
c1DX: // global
R1 = GHC.Types.False_closure+1;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c1E3: // global
R1 = GHC.Types.True_closure+2;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
But we would really like to always get the first variant. Which I will confidently say is better even without benchmarking.
Proposal
The issue is that we can use tagToEnem# to go from Int# to Bool if the range is [0..1]. But the current testBit implementation uses .&. so there is no guarantee that the value will be in this range, so we end up with a regular comparison.
If we rewrite testBit to produce the Bool via tagToEnum# in haskell then we can fix this without touching codeGen at all.
Describe your proposed feature here.