Better assembly when testing if byte array is equal to constant
Consider the following example. It's a function that tests whether or not the content of the ByteArray#
, starting at the given offset, is the string threat
:
{-# language MagicHash #-}
{-# OPTIONS_GHC -O2 -Wall -ddump-simpl -ddump-to-file -dsuppress-all -ddump-cmm -ddump-asm #-}
module StatusQuo
( isDangerous
) where
import GHC.Exts
isDangerous :: ByteArray# -> Int# -> Int#
{-# noinline isDangerous #-}
isDangerous b off = equal6 b off 't' 'h' 'r' 'e' 'a' 't'
equal6 :: ByteArray# -> Int# -> Char -> Char -> Char -> Char -> Char -> Char -> Int#
equal6 arr off (C# a) (C# b) (C# c) (C# d) (C# e) (C# f) =
eqChar# (indexCharArray# arr off) a
`andI#`
eqChar# (indexCharArray# arr (off +# 1#)) b
`andI#`
eqChar# (indexCharArray# arr (off +# 2#)) c
`andI#`
eqChar# (indexCharArray# arr (off +# 3#)) d
`andI#`
eqChar# (indexCharArray# arr (off +# 4#)) e
`andI#`
eqChar# (indexCharArray# arr (off +# 5#)) f
The generated assembly includes six repetitive sections. Three of them are shown here:
_s1jj:
leaq 16(%r14),%rcx
leaq 3(%rsi),%rdx
movzbl (%rcx,%rdx,1),%ecx
cmpq $101,%rcx
jne _c1k8
_c1ke:
leaq 16(%r14),%rcx
leaq 2(%rsi),%rdx
movzbl (%rcx,%rdx,1),%ecx
cmpq $114,%rcx
jne _c1k8
_c1kw:
leaq 16(%r14),%rcx
leaq 1(%rsi),%rdx
movzbl (%rcx,%rdx,1),%ecx
cmpq $104,%rcx
jne _c1k8
Each character costs us five instructions, the last of which is a seldom taken jump. Also, the first leaq
instruction is redundant. I don't understand the redundant leading leaq
instructions, but the jne
(although unnecessary) does make sense. In GHC Core, we can see that the compiler undoes my attempt to avoid casing:
isDangerous
= \ b_a10M off_a10N ->
case indexCharArray# b_a10M (+# off_a10N 5#) of wild_Xl
{ __DEFAULT ->
join {
$j_s13E wild1_Xx
= case indexCharArray# b_a10M (+# off_a10N 4#) of wild2_Xm
{ __DEFAULT ->
join {
$j1_s13C wild3_XD
= case indexCharArray# b_a10M (+# off_a10N 3#) of {
__DEFAULT -> 0#;
'e'# ->
case indexCharArray# b_a10M (+# off_a10N 2#) of {
__DEFAULT -> 0#;
'r'# ->
case indexCharArray# b_a10M (+# off_a10N 1#) of {
__DEFAULT -> 0#;
'h'# ->
case indexCharArray# b_a10M off_a10N of {
__DEFAULT -> 0#;
't'# -> andI# (andI# 1# wild3_XD) wild1_Xx
}
}
}
} } in
case wild2_Xm of {
__DEFAULT -> jump $j1_s13C 0#;
'a'# -> jump $j1_s13C 1#
}
} } in
case wild_Xl of {
__DEFAULT -> jump $j_s13E 0#;
't'# -> jump $j_s13E 1#
}
}
For the particular problem I'm trying to solve, I can just sidestep the issue with Core entirely. I'm going to add a compareByteArrayAddr#
primop to complement the compareByteArrays#
primop. In Core, this primop will be left alone, but when compiling STG to cmm, I think I should be able to recognize that the Addr#
argument is a constant and then unroll the equality check. I'm working on this on a local branch right now. But, someone still may be interested in getting the compiler to compile the original code in this issue better.