dropWhile fusion in GHC 9.6 is harmful
!9468 (closed) (#18964) introduced fusion rules for Data.List.dropWhile
. They cause performance regressions in many practical scenarios. Consider the following program:
{-# OPTIONS_GHC -O #-}
dropWhile2 :: Int -> [Int] -> [Int]
dropWhile2 n = dropWhile (< n) . dropWhile (< n)
main :: IO ()
main = do
let xs = [0..9999999]
print $ last $ dropWhile2 0 xs
print $ last $ dropWhile2 1 xs
print $ last $ dropWhile2 2 xs
print $ last $ dropWhile2 3 xs
print $ last $ dropWhile2 4 xs
print $ last $ dropWhile2 5 xs
In GHC 9.4 there are no fusion rules for dropWhile
, so the program does what it says on the tin: allocate xs
, drop short prefixes, print last elements of essentially the same list:
$ ghc-9.4 -fforce-recomp DropWhile.hs && ./DropWhile +RTS -s
[1 of 2] Compiling Main ( DropWhile.hs, DropWhile.o )
[2 of 2] Linking DropWhile [Objects changed]
9999999
9999999
9999999
9999999
9999999
9999999
640,060,952 bytes allocated in the heap
993,491,968 bytes copied during GC
272,648,736 bytes maximum residency (8 sample(s))
46,704,096 bytes maximum slop
620 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 145 colls, 0 par 0.139s 0.141s 0.0010s 0.0048s
Gen 1 8 colls, 0 par 0.213s 0.270s 0.0337s 0.1076s
INIT time 0.000s ( 0.003s elapsed)
MUT time 0.169s ( 0.173s elapsed)
GC time 0.353s ( 0.410s elapsed)
EXIT time 0.000s ( 0.012s elapsed)
Total time 0.521s ( 0.599s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 3,798,400,977 bytes per MUT second
Productivity 32.3% of total user, 28.9% of total elapsed
But in GHC 9.6 fusion rules for dropWhile
kick in, and now every call rebuilds the entire list:
$ ghc-9.6 -fforce-recomp DropWhile.hs && ./DropWhile +RTS -s
[1 of 2] Compiling Main ( DropWhile.hs, DropWhile.o )
[2 of 2] Linking DropWhile [Objects changed]
9999999
9999999
9999999
9999999
9999999
9999999
4,000,060,160 bytes allocated in the heap
2,700,281,672 bytes copied during GC
400,024,248 bytes maximum residency (12 sample(s))
68,513,096 bytes maximum slop
1234 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 945 colls, 0 par 0.393s 0.395s 0.0004s 0.0024s
Gen 1 12 colls, 0 par 0.515s 0.615s 0.0512s 0.1482s
INIT time 0.002s ( 0.002s elapsed)
MUT time 0.464s ( 0.374s elapsed)
GC time 0.908s ( 1.010s elapsed)
EXIT time 0.013s ( 0.002s elapsed)
Total time 1.387s ( 1.388s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 8,618,831,778 bytes per MUT second
Productivity 33.5% of total user, 27.0% of total elapsed
There are 6x more allocations and the program is 3x slower.
The fundamental problem is that dropWhile
is a paramorphism, but dropWhileFB
is a catamorphism. A similar issue used to plague text
, see A tale of two tails.
There are multiple control environment issues with !9468 (closed):
- The change is not reflected in the changelog.
- There was no impact assessment or performance analysis.
- There was no CLC proposal...
- ...even while the approver was explicitly reminded shortly before that one is due.
I suggest to revert the change before GHC 9.6 is out.