Skip to content

Optimize Data.List.elem

The use case

I looked at this function here:

isGenDelims :: Char -> Bool
isGenDelims c = c `elem` ":/?#[]@"

We get this core:

Network.URI.isGenDelims :: GHC.Types.Char -> GHC.Types.Bool
[GblId, Arity=1, Str=<L,U>, Unf=OtherCon []] =
    \r [c1_smmE]
        GHC.List.elem
            GHC.Classes.$fEqChar c1_smmE Network.URI.isGenDelims1;

It's obvious we don't specialization here.

The reason for this is the code for elem (in core):

Rec {
-- RHS size: {terms: 21, types: 15, coercions: 0, joins: 0/0}
elem [InlPrag=NOINLINE[1], Occ=LoopBreaker]
  :: forall a. Eq a => a -> [a] -> Bool
[GblId,
 Arity=3,
 Caf=NoCafRefs,
 Str=<L,U(C(C1(U)),A)><L,U><S,1*U>,
 Unf=OtherCon []]
elem
  = \ (@a_a299)
      ($dEq_a29b :: Eq a_a299)
      (ds_d2Jp :: a_a299)
      (ds1_d2Jq :: [a_a299]) ->
      case ds1_d2Jq of {
        [] -> GHC.Types.False;
        : y_a1kG ys_a1kH ->
          case == @a_a299 $dEq_a29b ds_d2Jp y_a1kG of {
            False -> elem @a_a299 $dEq_a29b ds_d2Jp ys_a1kH;
            True -> GHC.Types.True
          }
      }
end Rec }

We won't get specialization since we don't have a unfolding. Which is the result of marking it as NOINLINE[1] and the fact that it's recursive.

This means in the generated code for each list element we:

  • Call GHC.Classes.== @Char which returns Chars instance for ==
  • call stg_ap_0 to evaluate the function.
  • return to stg_ap_pp, putting the arguments into regs.
  • call the actual == function.
  • return to elem, check the result of the comparison and potentially
    start over with the next element.

Ideally we could cache some of this work, but this seems to cause more overhead than it removes.

Proposals

I propose these changes purely for performance reasons.

I looked at these possibilities.

Provide unfoldings so use sites can specialize (#18285).

The downside is that we will generate more code. The upside is an extreme improvement in performance when specialization triggers.

To some degree code bloat could be reduced by providing specializations for some of the types in base.

Rewrite using a local recursive function

This allows elem to be inlined and specialized without changing the existing pragmas.

Improves performance slightly over simply adding unfoldings.

Force the dictionary selector.

Rewrite elem like this:

{-# NOINLINE myElem #-}
myElem :: forall a. Eq a => a -> [a] -> Bool
myElem x !ys' =
    go ys'
    where
        !eq = (==) :: a -> a -> Bool
        go [] = False
        go (y:ys) = eq x y || go ys

This has all the same properties as the existing implementation, except for better performance. It's big upside is that we don't risk code bloat (as we still don't provide a unfolding).

Overview

Results are for a list argument of length 7 and a frequent result of False.

Data.List.elem Forced class selector + NOINLINE elem specialized elem with local go recursion
Run Time(s) 3s 2.1s 1.3s 1.1s
Run Time (%) 100% -30% -56% -63%
#no changes no downsides code bloat? code bloat?

The forced class selector variant get's better as lists get longer. But it's always better than elem.

Edited by Andreas Klebinger
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information