2-fold memory and 2.5-fold time usage regression from GHC 8.10.4 to 9.01 in the mmark package
Summary
Building mmark-0.0.7.2
with GHC 9.0.1 takes 2 times more memory and 2.5 times longer than with GHC 8.10.4.
Reproduction steps
$ cabal get mmark-0.0.7.2 && cd mmark-0.0.7.2
$ cat > cabal.project
packages: .
package mmark
ghc-options: -Rghc-timing
^D
$ cabal build
Results
GHC version | Peak megabytes allocated | Mutator CPU time in seconds | GC CPU time in seconds |
---|---|---|---|
8.4.4 | 2839 | 66.720 | 30.528 |
8.6.5 | 2050 | 65.112 | 27.428 |
8.8.4 | 2662 | 63.661 | 29.302 |
8.10.4 | 2581 | 64.875 | 30.612 |
9.0.1 | 5170 | 196.415 | 56.772 |
As in #14974 (closed), GHC appears to spend most of the time on the Text.MMark.Parser
module.
Note that mmark
uses -O2
.
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Simon Jakobi added compiler perf needs triage labels
added compiler perf needs triage labels
- Simon Jakobi changed the description
Compare with previous version changed the description
- Ben Gamari added Pnormal Tbug labels and removed needs triage label
added Pnormal Tbug labels and removed needs triage label
- Maintainer
I'm looking at this in conjunction with #19478 (closed) . I suspect they are related.
- Author Developer
I'm looking at this in conjunction with #19478 (closed) . I suspect they are related.
Since the investigation in #19478 (closed) seems to be stuck, maybe this issue should now be investigated on its own?!
Also, for reference, here's an earlier issue reporting a compiler perf regression for
mmark
: #14974 (closed) Collapse replies - Maintainer
Indeed @mpickering will try to do a bit of sleuthing on this.
- Matthew Pickering assigned to @mpickering
assigned to @mpickering
- Developer
- Developer
Core sizes before code generation are slightly larger. It doesn't help that code generation is slower and also happens twice because of -dynamic-too.
9.0.1
Result size of CorePrep = {terms: 272,694, types: 326,612, coercions: 41,251, joins: 3,315/17,316}
8.10.4
Result size of CorePrep = {terms: 212,262, types: 260,529, coercions: 39,628, joins: 1,908/12,622}
- Developer
- Developer
- Matthew Pickering mentioned in commit d157900d
mentioned in commit d157900d
- Matthew Pickering mentioned in merge request !6523 (closed)
mentioned in merge request !6523 (closed)
- Matthew Pickering mentioned in commit fc37f813
mentioned in commit fc37f813
- Matthew Pickering mentioned in commit 339be02c
mentioned in commit 339be02c
- Matthew Pickering mentioned in commit 4025abc2
mentioned in commit 4025abc2
- Matthew Pickering mentioned in commit 8ced9ee0
mentioned in commit 8ced9ee0
- Matthew Pickering mentioned in commit 2a400dbf
mentioned in commit 2a400dbf
- Matthew Pickering mentioned in commit fad22fa4
mentioned in commit fad22fa4
- Matthew Pickering mentioned in commit fb7be28a
mentioned in commit fb7be28a
- Matthew Pickering mentioned in commit 06316bc4
mentioned in commit 06316bc4
- Matthew Pickering mentioned in commit a9f9a6ba
mentioned in commit a9f9a6ba
- Matthew Pickering mentioned in commit 9eff805a
mentioned in commit 9eff805a
- Matthew Pickering mentioned in commit 53dc8e41
mentioned in commit 53dc8e41
- Matthew Pickering mentioned in commit b041ea77
mentioned in commit b041ea77
- Matthew Pickering mentioned in commit c480f8f2
mentioned in commit c480f8f2
- Developer
Fixed by !6523 (closed)
- Matthew Pickering closed
closed
- Developer
Well done Matthew!
Is it worth a regression test of some kind?
Not that it really matters, but I'm curious about how it was introduced in the first place, since 8.10 was ok.
- Author Developer
@mpickering Thanks for the great improvements in !6523 (closed)! Could you possibly summarize the new memory and time usage numbers here?
My understanding was that the time usage didn't show much improvement yet, so maybe this issue shouldn't be closed?!
- Developer
Version Maximum Residency (MB) Allocations (GB) Mutator Time (s) GC Time (s) 8.10.7 2333 57.7 26 14.1 9.0.1 3958 113.5 77.3 27.2 9.2.1 4279 112 74.5 27.3 HEAD (446ca8b9) 2215 64.6 44.9 17.4 @sjakobi The runtime appears to be greatly improved on HEAD but still regressing relative to 8.10.7.
@simonpj The regression was due to the new code layout algorithm in the code generator (which was introduced in 9.0).
- Developer
57bcf4de24d4a04aedc2462f9b3475a20c79f60608991e8 -package-id yaml-0.11.5.0-dcf19e4e51d9d2f63a01a73740fcb8a0e3d6698ce20b1d8088856b1c06bf5a12 -XHaskell2010 Text.MMark Text.MMark.Extension Text.MMark.Parser Text.MMark.Parser.Internal Text.MMark.Parser.Internal.Type Text.MMark.Render Text.MMark.Trans Text.MMark.Type Text.MMark.Util -O2 -Wall -hide-all-packages total time = 61.05 secs (61055 ticks @ 1000 us, 1 processor) total alloc = 48,846,900,208 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc RegAlloc-linear GHC.CmmToAsm compiler/GHC/CmmToAsm.hs:(586,27)-(588,55) 31.8 2.2 simplIdF GHC.Core.Opt.Simplify compiler/GHC/Core/Opt/Simplify.hs:1148:61-79 15.8 27.0 deSugar GHC.Driver.Main compiler/GHC/Driver/Main.hs:623:7-44 4.8 0.7 OccAnal GHC.Core.Opt.Pipeline compiler/GHC/Core/Opt/Pipeline.hs:(715,22)-(716,42) 4.0 6.3 occAnalBind.assoc GHC.Core.Opt.OccurAnal compiler/GHC/Core/Opt/OccurAnal.hs:812:13-64 3.1 4.8 GHC.CmmToAsm.CFG.mkGlobalWeights GHC.CmmToAsm.CFG compiler/GHC/CmmToAsm/CFG.hs:975:1-15 2.9 3.1 simplNonRecJoinPoint GHC.Core.Opt.Simplify compiler/GHC/Core/Opt/Simplify.hs:1221:39-83 2.4 3.3 simplRecOrTopPair-normal GHC.Core.Opt.Simplify compiler/GHC/Core/Opt/Simplify.hs:(306,5)-(307,62) 2.3 2.8 simplNonRecE GHC.Core.Opt.Simplify compiler/GHC/Core/Opt/Simplify.hs:1224:31-78 2.3 2.4 StgToCmm GHC.Driver.Main compiler/GHC/Driver/Main.hs:1762:13-97 1.6 2.4 rebuild GHC.Core.Opt.Simplify compiler/GHC/Core/Opt/Simplify.hs:1149:60-85 1.6 2.7
- Developer
If I comment out
findPrefRealReg
then mutator time drops to 32.7s. - Matthew Pickering mentioned in commit 3872cc56
mentioned in commit 3872cc56
- Matthew Pickering mentioned in merge request !6591 (closed)
mentioned in merge request !6591 (closed)
- Matthew Pickering mentioned in commit 4f6be71d
mentioned in commit 4f6be71d
- Sylvain Henry mentioned in merge request !6594 (closed)
mentioned in merge request !6594 (closed)
- Author Developer
Reopening until the remaining compile time regression is fixed.
I'm very happy to see the progress you're making in !6591 (closed) and !6594 (closed)! :)
- Simon Jakobi reopened
reopened
- Matthew Pickering mentioned in commit 7440da56
mentioned in commit 7440da56
- Matthew Pickering mentioned in commit 0019cd9c
mentioned in commit 0019cd9c
- Matthew Pickering mentioned in commit ec3436e6
mentioned in commit ec3436e6
- Matthew Pickering mentioned in commit 9600a5fb
mentioned in commit 9600a5fb
- Matthew Pickering mentioned in commit 5a616ce2
mentioned in commit 5a616ce2
- Developer
!6591 (closed) and !6523 (closed) are now merged which brings performance to a comparable level to before (slightly slower still) but the obvious regressions I think have been fixed.
- Matthew Pickering closed
closed
- Author Developer
Awesome! :)
Could you possibly give another update to the perf number table from #19471 (comment 378363), @mpickering?
Collapse replies - Developer
Version Maximum Residency (MB) Allocations (GB) Mutator Time (s) GC Time (s) 8.10.7 2333 57.7 26 14.1 9.0.1 3958 113.5 77.3 27.2 9.2.1 4279 112 74.5 27.3 HEAD (446ca8b9) 2215 64.6 44.9 17.4 HEAD (new) 2327 63.8 33.9 19.1 1 - Developer
So things are still a bit worse than 8.10.7 but nowhere near as bad as before.
- Developer
So things are still a bit worse than 8.10.7 but nowhere near as bad as before.
It's mysterious how 8.10 is so much better, despite all the perf work we have done since.
My first bet would be that we are simply generating more code, perhaps as a result of more inlining. And that might make the compiled program run faster, which might be a good tradeoff.
- Author Developer
Also, is there some kind of regression test?
Collapse replies - Developer
There isn't a regression test because unfortunately our testing infrastructure only tracks allocations rather than runtime, and the latest regression didn't cause an increase in allocations.
- Matthew Pickering mentioned in commit 5a96b725
mentioned in commit 5a96b725