text benchmarks are wildly slower in 9.0.1
Reproduction
- See also #19701 (closed), which reports a further regression from 9.0 to 9.2.
- Checkout
text
:
git clone https://github.com/haskell/text
cd text
git checkout ca73ae38ed3744ec930cbfdb706147ec29d5a185
git clone https://github.com/haskell/text-test-data benchmarks/text-test-data
make -Cbenchmarks/text-test-data
- Run
ascii-small
benchmarks with GHC 8.10.6 (it will take around five minutes):
cabal run -w ghc-8.10.6 benchmarks -- -p '/Text+ascii-small/' --csv 8.10.csv --hide-successes
- Compare again benchmarks with GHC 9.0.1 (it will take around five minutes):
cabal run -w ghc-9.0.1 benchmarks -- -p '/Text+ascii-small/' --baseline 8.10.csv --fail-if-slower 50 --hide-successes
Expected results
No benchmarks became slower.
Actual results
All
Pure
cons
LazyText+ascii-small: FAIL (1.87s)
914 μs ± 62 μs, 4464167% slower than baseline
intersperse
Text+ascii-small: FAIL (9.14s)
560 μs ± 31 μs, 99% slower than baseline
isInfixOf
Text+ascii-small: FAIL (5.67s)
2.8 ms ± 234 μs, 224% slower than baseline
mapAccumL
Text+ascii-small: FAIL (2.16s)
263 μs ± 17 μs, 157% slower than baseline
LazyText+ascii-small: FAIL (4.21s)
258 μs ± 7.6 μs, 160% slower than baseline
mapAccumR
Text+ascii-small: FAIL (5.18s)
633 μs ± 37 μs, 211% slower than baseline
LazyText+ascii-small: FAIL (5.34s)
659 μs ± 46 μs, 228% slower than baseline
reverse
Text+ascii-small: FAIL (2.23s)
273 μs ± 26 μs, 241% slower than baseline
LazyText+ascii-small: FAIL (2.18s)
269 μs ± 21 μs, 248% slower than baseline
zipWith
Text+ascii-small: FAIL (3.67s)
1.8 ms ± 113 μs, 91% slower than baseline
length
decode
Text+ascii-small: FAIL (1.57s)
190 μs ± 15 μs, 249% slower than baseline
intersperse
Text+ascii-small: FAIL (1.54s)
376 μs ± 36 μs, 139% slower than baseline
replicate char
LazyText+ascii-small: FAIL (2.09s)
509 μs ± 36 μs, 1037% slower than baseline
zipWith
Text+ascii-small: FAIL (2.30s)
1.1 ms ± 86 μs, 101% slower than baseline
- Show closed items
Relates to
- #198229.0.2
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
added core libraries needs triage potential runtime perf regression labels
- Developer
I suspect that this is mostly due to
text
still usingwithForeignPtr
instead of the newunsafeWithForeignPtr
. Compare #19474 / https://github.com/haskell/vector/pull/372.!5268 (closed) will probably also help, as it did in #19539 (closed).
- Developer
This looks high priority to me -- serious perf regression on an important package.
Don't our pre-release hackage smoke-tests show these problems?
- Ben Gamari added Phigh runtime perf labels and removed needs triage label
added Phigh runtime perf labels and removed needs triage label
- Maintainer
Don't our pre-release hackage smoke-tests show these problems?
No, we do not currently run benchmarks in our
head.hackage
testing. This is something that I have proposed changing in my HF technical track proposal. - Developer
I suspect that this is mostly due to
text
still usingwithForeignPtr
instead of the newunsafeWithForeignPtr
. Compare #19474 / https://github.com/haskell/vector/pull/372.It would be very nice if a fixed version of
text
would be included with GHC 9.0.2. - Maintainer
Yes, I intend on fixing this for %9.0.2; I have a patch moving
text
tounsafeWithForeignPtr
but I'll admit that I'm not certain that this is really the culprit and consequently would like to benchmark. - Ben Gamari changed milestone to %9.0.2
changed milestone to %9.0.2
- Bodigrim changed the description
Compare with previous version changed the description
- Maintainer
Returning to this again, I have a extracted a minimal reproducer which seems to show the root of the problem:
module Hi where import qualified Data.Text as T test :: T.Text -> () test t = T.zipWith min (T.pack "adfsdf") t `seq` ()
When compiled with %8.10.4, this produces a single top-level worker-wrapper'd binding:
-- RHS size: {terms: 641, types: 508, coercions: 23, joins: 10/21} Hi.$whi :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> ()
By contrast, %9.0.1 appears to fail to inline the fusion step function into the string traversal:
-- RHS size: {terms: 243, types: 181, coercions: 0, joins: 1/4} $j :: forall {s}. (s -> Data.Text.Internal.Fusion.Types.Step s Char) -> s -> Data.Text.Internal.Fusion.Size.Size -> () -- RHS size: {terms: 255, types: 161, coercions: 0, joins: 6/9} Hi.$whi :: GHC.Prim.ByteArray# -> GHC.Prim.Int# -> GHC.Prim.Int# -> ()
where
$whi
contains many occurrences of$j
called with alet
-bound step function. - Developer
This look like good insight. Maybe you are on a roll... but if you want me to look, can you write down carefully how to repro?
- Maintainer
The root(?) of the problem appears to be the unfolding of
Data.Text.Internal.Fusion.Common.zipWith
, which is defined as follows:data Zip a b m = Z1 !a !b | Z2 !a !b !m zipWith :: forall a b. (a -> a -> b) -> Stream a -> Stream a -> Stream b zipWith f (Stream next0 (sa0 :: sa) len1) (Stream next1 (sb0 :: sb) len2) = Stream next (Z1 sa0 sb0) (smaller len1 len2) where next :: Zip sa sb a -> Step (Zip sa sb a) b next (Z1 sa sb) = case next0 sa of Done -> Done Skip sa' -> Skip (Z1 sa' sb) Yield a sa' -> Skip (Z2 sa' sb a) next (Z2 sa' sb a) = case next1 sb of Done -> Done Skip sb' -> Skip (Z2 sa' sb' a) Yield b sb' -> Yield (f a b) (Z1 sa' sb') {-# INLINE [0] zipWith #-}
Note the
INLINE [0]
as this appears to be important.With 8.10.4, this binding gets the following unfolding:
-- RHS size: {terms: 159, types: 195, coercions: 0, joins: 0/1} zipWith [InlPrag=INLINE[0] (sat-args=3)] :: forall a b. (a -> a -> b) -> Stream a -> Stream a -> Stream b [GblId, Arity=3, Caf=NoCafRefs, Str=<L,C(C1(U))><S,1*U><S,1*U>, Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=3,unsat_ok=False,boring_ok=False) Tmpl= \ (@ a) (@ b) (f [Occ=OnceL!] :: a -> a -> b) (ds [Occ=Once!] :: Stream a) (ds1 [Occ=Once!] :: Stream a) -> case ds of { Stream @ s next0 [Occ=OnceL!] ds2 [Occ=Once] len1 [Occ=Once!] -> case ds1 of { Stream @ s1 next1 [Occ=OnceL!] ds3 [Occ=Once] len2 [Occ=Once*!] -> join { $j [Occ=Once*!T[1]] :: Size -> Stream b [LclId[JoinId(1)], Arity=1, Unf=OtherCon []] $j (dt [Occ=Once, OS=OneShot] :: Size Unf=OtherCon []) = Data.Text.Internal.Fusion.Types.Stream @ b @ (Zip s s1 a) (\ (ds4 [Occ=Once!] :: Zip s s1 a) -> case ds4 of { Z1 sa [Occ=Once] sb [Occ=Once*] -> case next0 sa of { Done -> Data.Text.Internal.Fusion.Types.Done @ (Zip s s1 a) @ b; Skip sa' [Occ=Once] -> Data.Text.Internal.Fusion.Types.Skip @ (Zip s s1 a) @ b (Data.Text.Internal.Fusion.Common.Z1 @ s @ s1 @ a sa' sb); Yield a1 [Occ=Once] sa' [Occ=Once] -> Data.Text.Internal.Fusion.Types.Skip @ (Zip s s1 a) @ b (Data.Text.Internal.Fusion.Common.Z2 @ s @ s1 @ a sa' sb a1) }; Z2 sa' [Occ=Once*] sb [Occ=Once] a1 [Occ=Once*] -> case next1 sb of { Done -> Data.Text.Internal.Fusion.Types.Done @ (Zip s s1 a) @ b; Skip sb' [Occ=Once] -> Data.Text.Internal.Fusion.Types.Skip @ (Zip s s1 a) @ b (Data.Text.Internal.Fusion.Common.Z2 @ s @ s1 @ a sa' sb' a1); Yield b1 [Occ=Once] sb' [Occ=Once] -> case f a1 b1 of dt1 [Occ=Once] { __DEFAULT -> Data.Text.Internal.Fusion.Types.Yield @ (Zip s s1 a) @ b dt1 (Data.Text.Internal.Fusion.Common.Z1 @ s @ s1 @ a sa' sb') } } }) (Data.Text.Internal.Fusion.Common.Z1 @ s @ s1 @ a ds2 ds3) dt } in case len1 of { Data.Text.Internal.Fusion.Size.Between dt dt1 -> case len2 of { Data.Text.Internal.Fusion.Size.Between dt2 dt3 -> case GHC.Prim.<=# dt1 dt2 of { __DEFAULT -> case GHC.Prim.<=# dt3 dt of { __DEFAULT -> case GHC.Prim.<=# dt dt2 of { __DEFAULT -> case GHC.Prim.<=# dt1 dt3 of { __DEFAULT -> jump $j (Data.Text.Internal.Fusion.Size.Between dt2 dt3); 1# -> jump $j (Data.Text.Internal.Fusion.Size.Between dt2 dt1) }; 1# -> case GHC.Prim.<=# dt1 dt3 of { __DEFAULT -> jump $j (Data.Text.Internal.Fusion.Size.Between dt dt3); 1# -> jump $j (Data.Text.Internal.Fusion.Size.Between dt dt1) } }; 1# -> jump $j (Data.Text.Internal.Fusion.Size.Between dt2 dt3) }; 1# -> jump $j (Data.Text.Internal.Fusion.Size.Between dt dt1) }; Data.Text.Internal.Fusion.Size.Unknown -> jump $j (Data.Text.Internal.Fusion.Size.Between 0# dt1) }; Data.Text.Internal.Fusion.Size.Unknown -> case len2 of { Data.Text.Internal.Fusion.Size.Between _ [Occ=Dead] dt1 [Occ=Once] -> jump $j (Data.Text.Internal.Fusion.Size.Between 0# dt1); Data.Text.Internal.Fusion.Size.Unknown -> jump $j Data.Text.Internal.Fusion.Size.Unknown } } } }}]
Note how, despite the fact that
zipWith
's unfolding is stable, it appears that some amount of simplification has happened here (e.g. the mentions ofzip
in$j
are gone).By contrast, in 9.0.1 we clearly have an unsimplified unfolding:
-- RHS size: {terms: 159, types: 195, coercions: 0, joins: 0/1} zipWith [InlPrag=INLINE[0] (sat-args=3)] :: forall a b. (a -> a -> b) -> Stream a -> Stream a -> Stream b [GblId, Arity=3, Str=<L,C(C1(U))><S,1*U><S,1*U>, Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=ALWAYS_IF(arity=3,unsat_ok=False,boring_ok=False) Tmpl= \ (@a) (@b) (f [Occ=OnceL1!] :: a -> a -> b) (ds [Occ=Once1!] :: Stream a) (ds1 [Occ=Once1!] :: Stream a) -> case ds of { Stream @s next0 [Occ=OnceL1!] ds2 [Occ=Once11] len1 [Occ=Once1!] -> case ds1 of { Stream @s1 next1 [Occ=OnceL1!] ds3 [Occ=Once11] len2 [Occ=Once2!] -> let { arg [Occ=Once11] :: Zip s s1 a -> Step (Zip s s1 a) b [LclId, Arity=1, Unf=OtherCon []] arg = \ (ds4 [Occ=Once1!] :: Zip s s1 a) -> case ds4 of { Z1 sa [Occ=Once1] sb [Occ=Once2] -> case next0 sa of { Done -> Data.Text.Internal.Fusion.Types.Done @(Zip s s1 a) @b; Skip sa' [Occ=Once1] -> Data.Text.Internal.Fusion.Types.$WSkip @(Zip s s1 a) @b (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a sa' sb); Yield a1 [Occ=Once1] sa' [Occ=Once1] -> Data.Text.Internal.Fusion.Types.$WSkip @(Zip s s1 a) @b (Data.Text.Internal.Fusion.Common.$WZ2 @s @s1 @a sa' sb a1) }; Z2 sa' [Occ=Once2] sb [Occ=Once1] a1 [Occ=Once2] -> case next1 sb of { Done -> Data.Text.Internal.Fusion.Types.Done @(Zip s s1 a) @b; Skip sb' [Occ=Once1] -> Data.Text.Internal.Fusion.Types.$WSkip @(Zip s s1 a) @b (Data.Text.Internal.Fusion.Common.$WZ2 @s @s1 @a sa' sb' a1); Yield b1 [Occ=Once1] sb' [Occ=Once1] -> Data.Text.Internal.Fusion.Types.$WYield @(Zip s s1 a) @b (f a1 b1) (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a sa' sb') } } } in case len1 of wild2 [Occ=Once2] { Data.Text.Internal.Fusion.Size.Between dt dt1 -> case len2 of wild3 [Occ=Once1] { Data.Text.Internal.Fusion.Size.Between dt2 dt3 -> case GHC.Prim.<=# dt1 dt2 of { __DEFAULT -> case GHC.Prim.<=# dt3 dt of { __DEFAULT -> case GHC.Prim.<=# dt1 dt3 of { __DEFAULT -> case GHC.Prim.<=# dt dt2 of { __DEFAULT -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) (Data.Text.Internal.Fusion.Size.$WBetween (GHC.Types.I# dt2) (GHC.Types.I# dt3)); 1# -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) (Data.Text.Internal.Fusion.Size.$WBetween (GHC.Types.I# dt) (GHC.Types.I# dt3)) }; 1# -> case GHC.Prim.<=# dt dt2 of { __DEFAULT -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) (Data.Text.Internal.Fusion.Size.$WBetween (GHC.Types.I# dt2) (GHC.Types.I# dt1)); 1# -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) (Data.Text.Internal.Fusion.Size.$WBetween (GHC.Types.I# dt) (GHC.Types.I# dt1)) } }; 1# -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) wild3 }; 1# -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) wild2 }; Data.Text.Internal.Fusion.Size.Unknown -> case dt of { __DEFAULT -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) (Data.Text.Internal.Fusion.Size.$WBetween (GHC.Types.I# 0#) (GHC.Types.I# dt1)); 0# -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) wild2 } }; Data.Text.Internal.Fusion.Size.Unknown -> case len2 of wild3 [Occ=Once1] { Data.Text.Internal.Fusion.Size.Between dt [Occ=Once1!] dt1 [Occ=Once1] -> case dt of { __DEFAULT -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) (Data.Text.Internal.Fusion.Size.$WBetween (GHC.Types.I# 0#) (GHC.Types.I# dt1)); 0# -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) wild3 }; Data.Text.Internal.Fusion.Size.Unknown -> Data.Text.Internal.Fusion.Types.$WStream @b @(Zip s s1 a) arg (Data.Text.Internal.Fusion.Common.$WZ1 @s @s1 @a ds2 ds3) Data.Text.Internal.Fusion.Size.Unknown } } } }}]
It's not clear to me that this is the root cause of the performance difference, but it is certainly a difference which I cannot explain.
Intriguingly, removing the
[0]
phasing fromzipWith
'sINLINE
pragma causes 8.10.4 to produce an unsimplified unfolding, similar to 9.0.1. - Maintainer
For the record, I am reproducing this with:
$ git clone git@github.com:haskell/text $ cd text $ cat <<EOF > Test.hs module Test where import qualified Data.Text as T flub :: T.Text -> T.Text flub t = T.zipWith min (T.pack "asdfg") t EOF $ ghc-9.0.1 -O -isrc -Iinclude -fforce-recomp Test.hs \ -ddump-simpl -dsuppress-uniques -dverbose-core2core -ddump-inlinings \ -ddump-to-file -dumpdir out-9.0.1
Edited by Ben Gamari - Maintainer
The fact that the unfolding of
zipWith
was simplified at all in 8.10.4 surprised me. I have opened #19670 to track this. - Maintainer
I just noticed here that one of main differences between the unfolding in 8.10.4 and 9.0.1 is the fact that the former contains applications of
Data.Text.Internal.Fusion.Types.Stream
whereas the latter containsData.Text.Internal.Fusion.Types.$WStream
, a wrapper. I wonder if this regression is related to the fact that we now always introduce data constructor wrappers due to LinearTypes. - Maintainer
The fact that we now always introduce a data constructor wrapper appears not to be relevant as
Stream
also had a wrapper in 8.10.4.However, the wrapper itself is relevant as it is the reason why the phase of
zipWith
'sINLINE
pragma affects simplification. In particular, in GHC 8.10 data constructor wrappers inlined in phase 0, meaning that they would be inlined intozipWith
's unfolding (which will also be simplified in phase 0 as this is the activation of itsINLINE
pragma).By contrast, in 9.0 data constructor wrappers are inlined in the (later)
final
phase (see b78cc64e). Consequently, the wrapper does not inline into the unfolding.This difference in inlining behavior means that in 9.0.1 the
Stream
application fails to float in through a number ofcase
s.Edited by Ben Gamari - Maintainer
Here is a somewhat minimal testcase demonstrating the difference in unfoldings:
{-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE BangPatterns #-} module Test where import Data.Text.Internal.Fusion.Types (Stream(..), Step(..)) import Data.Text.Internal.Fusion.Size (smaller) data Zip a b m = Z1 !a !b | Z2 !a !b !m zipWith :: forall a b. (a -> a -> b) -> Stream a -> Stream a -> Stream b zipWith f (Stream next0 (sa0 :: sa) len1) (Stream next1 (sb0 :: sb) len2) = Stream next (Z1 sa0 sb0) s where s = smaller len1 len2 next :: Zip sa sb a -> Step (Zip sa sb a) b next (Z1 sa sb) = case next0 sa of Done -> Done Skip sa' -> Skip (Z1 sa' sb) Yield a sa' -> Skip (Z2 sa' sb a) next (Z2 sa' sb a) = case next1 sb of Done -> Done Skip sb' -> Skip (Z2 sa' sb' a) Yield b sb' -> Yield (f a b) (Z1 sa' sb') {-# INLINE [0] zipWith #-}
- Maintainer
Indeed it appears that making
s
in the test above more strict (with a bang pattern) avoids the difference in code generation. This essentially confirms that the phasing difference described above (#19557 (comment 345500)) is the root cause.Edited by Ben Gamari