Optimize Data.List.elem
The use case
I looked at this function here:
isGenDelims :: Char -> Bool
isGenDelims c = c `elem` ":/?#[]@"
We get this core:
Network.URI.isGenDelims :: GHC.Types.Char -> GHC.Types.Bool
[GblId, Arity=1, Str=<L,U>, Unf=OtherCon []] =
\r [c1_smmE]
GHC.List.elem
GHC.Classes.$fEqChar c1_smmE Network.URI.isGenDelims1;
It's obvious we don't specialization here.
The reason for this is the code for elem
(in core):
Rec {
-- RHS size: {terms: 21, types: 15, coercions: 0, joins: 0/0}
elem [InlPrag=NOINLINE[1], Occ=LoopBreaker]
:: forall a. Eq a => a -> [a] -> Bool
[GblId,
Arity=3,
Caf=NoCafRefs,
Str=<L,U(C(C1(U)),A)><L,U><S,1*U>,
Unf=OtherCon []]
elem
= \ (@a_a299)
($dEq_a29b :: Eq a_a299)
(ds_d2Jp :: a_a299)
(ds1_d2Jq :: [a_a299]) ->
case ds1_d2Jq of {
[] -> GHC.Types.False;
: y_a1kG ys_a1kH ->
case == @a_a299 $dEq_a29b ds_d2Jp y_a1kG of {
False -> elem @a_a299 $dEq_a29b ds_d2Jp ys_a1kH;
True -> GHC.Types.True
}
}
end Rec }
We won't get specialization since we don't have a unfolding. Which is the result of marking it as NOINLINE[1] and the fact that it's recursive.
This means in the generated code for each list element we:
- Call GHC.Classes.== @Char which returns Chars instance for
==
- call stg_ap_0 to evaluate the function.
- return to stg_ap_pp, putting the arguments into regs.
- call the actual
==
function. - return to elem, check the result of the comparison and potentially
start over with the next element.
Ideally we could cache some of this work, but this seems to cause more overhead than it removes.
Proposals
I propose these changes purely for performance reasons.
I looked at these possibilities.
#18285).
Provide unfoldings so use sites can specialize (The downside is that we will generate more code. The upside is an extreme improvement in performance when specialization triggers.
To some degree code bloat could be reduced by providing specializations for some of the types in base.
Rewrite using a local recursive function
This allows elem to be inlined and specialized without changing the existing pragmas.
Improves performance slightly over simply adding unfoldings.
Force the dictionary selector.
Rewrite elem like this:
{-# NOINLINE myElem #-}
myElem :: forall a. Eq a => a -> [a] -> Bool
myElem x !ys' =
go ys'
where
!eq = (==) :: a -> a -> Bool
go [] = False
go (y:ys) = eq x y || go ys
This has all the same properties as the existing implementation, except for better performance. It's big upside is that we don't risk code bloat (as we still don't provide a unfolding).
Overview
Results are for a list argument of length 7 and a frequent result of False.
Data.List.elem | Forced class selector + NOINLINE |
elem specialized |
elem with local go recursion |
|
---|---|---|---|---|
Run Time(s) | 3s | 2.1s | 1.3s | 1.1s |
Run Time (%) | 100% | -30% | -56% | -63% |
#no changes | no downsides | code bloat? | code bloat? |
The forced class selector variant get's better as lists get longer. But it's always better than elem.