Skip to content

Better assembly when testing if byte array is equal to constant

Consider the following example. It's a function that tests whether or not the content of the ByteArray#, starting at the given offset, is the string threat:

{-# language MagicHash #-}

{-# OPTIONS_GHC -O2 -Wall -ddump-simpl -ddump-to-file -dsuppress-all -ddump-cmm -ddump-asm #-}

module StatusQuo
  ( isDangerous
  ) where

import GHC.Exts

isDangerous :: ByteArray# -> Int# -> Int#
{-# noinline isDangerous #-}
isDangerous b off = equal6 b off 't' 'h' 'r' 'e' 'a' 't'

equal6 :: ByteArray# -> Int# -> Char -> Char -> Char -> Char -> Char -> Char -> Int#
equal6 arr off (C# a) (C# b) (C# c) (C# d) (C# e) (C# f) =
  eqChar# (indexCharArray# arr off) a
  `andI#`
  eqChar# (indexCharArray# arr (off +# 1#)) b
  `andI#`
  eqChar# (indexCharArray# arr (off +# 2#)) c
  `andI#`
  eqChar# (indexCharArray# arr (off +# 3#)) d
  `andI#`
  eqChar# (indexCharArray# arr (off +# 4#)) e
  `andI#`
  eqChar# (indexCharArray# arr (off +# 5#)) f

The generated assembly includes six repetitive sections. Three of them are shown here:

_s1jj:
	leaq 16(%r14),%rcx
	leaq 3(%rsi),%rdx
	movzbl (%rcx,%rdx,1),%ecx
	cmpq $101,%rcx
	jne _c1k8
_c1ke:
	leaq 16(%r14),%rcx
	leaq 2(%rsi),%rdx
	movzbl (%rcx,%rdx,1),%ecx
	cmpq $114,%rcx
	jne _c1k8
_c1kw:
	leaq 16(%r14),%rcx
	leaq 1(%rsi),%rdx
	movzbl (%rcx,%rdx,1),%ecx
	cmpq $104,%rcx
	jne _c1k8

Each character costs us five instructions, the last of which is a seldom taken jump. Also, the first leaq instruction is redundant. I don't understand the redundant leading leaq instructions, but the jne (although unnecessary) does make sense. In GHC Core, we can see that the compiler undoes my attempt to avoid casing:

isDangerous
  = \ b_a10M off_a10N ->
      case indexCharArray# b_a10M (+# off_a10N 5#) of wild_Xl
      { __DEFAULT ->
      join {
        $j_s13E wild1_Xx
          = case indexCharArray# b_a10M (+# off_a10N 4#) of wild2_Xm
            { __DEFAULT ->
            join {
              $j1_s13C wild3_XD
                = case indexCharArray# b_a10M (+# off_a10N 3#) of {
                    __DEFAULT -> 0#;
                    'e'# ->
                      case indexCharArray# b_a10M (+# off_a10N 2#) of {
                        __DEFAULT -> 0#;
                        'r'# ->
                          case indexCharArray# b_a10M (+# off_a10N 1#) of {
                            __DEFAULT -> 0#;
                            'h'# ->
                              case indexCharArray# b_a10M off_a10N of {
                                __DEFAULT -> 0#;
                                't'# -> andI# (andI# 1# wild3_XD) wild1_Xx
                              }
                          }
                      }
                  } } in
            case wild2_Xm of {
              __DEFAULT -> jump $j1_s13C 0#;
              'a'# -> jump $j1_s13C 1#
            }
            } } in
      case wild_Xl of {
        __DEFAULT -> jump $j_s13E 0#;
        't'# -> jump $j_s13E 1#
      }
      }

For the particular problem I'm trying to solve, I can just sidestep the issue with Core entirely. I'm going to add a compareByteArrayAddr# primop to complement the compareByteArrays# primop. In Core, this primop will be left alone, but when compiling STG to cmm, I think I should be able to recognize that the Addr# argument is a constant and then unroll the equality check. I'm working on this on a local branch right now. But, someone still may be interested in getting the compiler to compile the original code in this issue better.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information