This is where we track various efforts to characterize and improve the performance of the compiler itself. If you are interested in the performance of code generated by GHC, see Performance/Runtime.
Identify tickets by using compiler perf label.
Simon's hit-list (June 2020)
MRs in flight
- !3548 (closed) Perf tweaks in Coercion
- !3497 (closed) StrictArg
- !3503 (merged) Eta expand in Unify monad
- !3531 (closed) (a) eta expansion for join points #18328 (closed) and (b) eta expansion and one-shot-ness #18355
- !3426 (merged) conSize discount. Depends on !3548 (closed), !3531 (closed), !3503 (merged)
- !3558 (closed) Perf hole in demand analysis #18304, #18349
- !3309 (closed) More eta-expansion in GHC compiler monads DO NOT LAND. This is WIP.
- #18354. Compiler perf tweaks
- #13253 (closed). Fixed by !3497 (closed) (StrictArg). But I think there are some unfixed SpecConstr variants in the comment stream.
- #10421 (closed). Fixed by !3497 (closed) (StrictArg)
- #15630 (closed). Fixed, I think, by the ufKeenessFactor fix. Regression test added !3509 (closed). See this comment for an analysis of what is happening.
- #17516 (closed). Fixed (apparently by accident) by #17901/!2843 (closed). Regression test added !3519 (closed)
- #18140 (closed). Fixed by !3497 (closed) (StrictArg)
- #18282 (closed). Fixed by !3426 (merged), which includes a regression test.
#18223 (closed). The specialiser causes a big blow up. But even with
-fno-specialisesomething non-linear seems to be happening.
- Vector tests: see this comment.
- #15488. f calls g twice; g calls h twice; etc. Simple unfolding blows up. Not sure what to do here. Not fixed by !3497 (closed), for example.
#15751. Something about zipWith in the
jsaddle-domwhich is hard to build
pandocwhich is hard to build
Some programs can produce very deeply nested types of non-linear size. See Scrap your type applications for a way to improve these bad cases
#9198: large performance regression in type checker speed in 7.8
- Types in Core blowing up quadratically (as seen in
- Types in Core blowing up quadratically (as seen in
One theme that seems to pop up rather often is the production of Core with long strings of coercions, with the size scaling non-linearly with the size of the types in the source program. These may or may not be due to similar root-causes.
#8095: TypeFamilies painfully slow
- Here a recursive type family instance leads to quadratic blow-up of coercions
This ticket has a discussion about a way to snip off coercions when not using
#7428: GHC compile times are seriously non-linear in program size
- Here a CPS'd State monad is leading to a quadratic blowup in Core size over successive simplifier iterations
#5642: Deriving Generic of a big type takes a long time and lots of space
#14338 (closed): Simplifier fails with "Simplifier ticks exhausted"
- Specialised dictionaries parametrized on a type-level list produce very large coercions.
One possible solution (proposed in #8095) is to eliminate coercions from the Core AST during usual compilation, instead only including them when we want to lint the Core.
Another theme often seen is issues characterized by perceived slowness during compilation of code deriving instances. This could be due to a number of reasons,
- the implementation of the logic responsible for producing the instance code is inefficient
- the instance itself is large but could be expressed more concisely
- the instance itself is large but irreducibly so
While it's possible to fix (1) and (2), (3) is inherent.
Uncategorised compiler performance issues
#2346: desugaring let-bindings
#10228 (closed): increase in compiler memory usage, regression from 7.8.4 to 7.10.1
#10289 (closed): 2.5k static HashSet takes too much memory to compile
- Significantly improved in memory usage from #10370 (closed), but worse at overall wall-clock time!
#7450: Regression in optimisation time of functions with many patterns (6.12 to 7.4)?
#10800 (closed): vector-0.11 compile time increased substantially with 7.10.1
- Regression in
vectortestsuite perhaps due to change in inlinings
- Regression in
#13639 (closed): Skylighting package compilation is glacial
- See this run comparing GHC 7.4.2 through 8.0.1
7.6 vs 7.8
- A bit difficult to decipher, since a lot of the stats/surrounding numbers were totally rewritten due to some Testsuite API overhauls.
- The results are a mix; there are things like
peak_megabytes_allocatedbeing bumped up a lot, but a lot of them also had
bytes_allocatedgo down as well. This one seems pretty mixed.
7.8 vs 7.10
- Things mostly got better according to these, not worse!
- Many of them had drops in
bytes_allocated, for example,
- The average improvement range is something like 1-3%.
- But one got much worse;
bytes_allocatedjumped from 45520936 to 115905208, 2.5x worse!
7.10 vs HEAD
Most results actually got better, not worse!
Silent superclasses made HEAD drop in several places, some noticeably over 2x
max_bytes_usedincreased in some cases, but not much, probably GC wibbles.
No major regressions, mostly wibbles.
(NB: Sporadically updated)
As of April 22nd, 2016:
- GHC HEAD: 14m9s (via 7.8.3) (because of Joachim's call-arity improvements)
- GHC 7.10: 15m43s (via 7.8.3)
- GHC 7.8: 12m54s (via 7.8.3)
- GHC 7.6: 8m19s (via 7.4.1)
Random note: GHC 7.10's build system actually disabled DPH (half a dozen more packages and probably a hundred extra modules), yet things *still* got slower over time!
Interesting third-party library numbers
- Compile time of some example program (
fltkhslibrary increased from about 15 seconds to more than a minute (original message).
- GHC takes significantly more memory compiling the
-j1(1GB vs 150MB). See #9370.
haskell-src-extstakes many tens of seconds to compile. Howeever, this may not be surprising: Consists of roughly 70 data definitions, some with many constructors,
deriving (Eq,Ord,Show,Typeable,Data,Foldable,Traversable)on most of them as well as defining
vector-algorithmsmay be a nice test and reportedly got slower to compile and run in recent GHC releases.
GHC 7.10 to GHC 8.0
- The performance effect of the
TypeInTypemerge shows up in b5d5d83122c93c2a25839127edfd6b2df7ed6928 ("Revert .gitmodules changes from 67465497") due to various broken intermediate commits.
91c6b1f54aea658b0056caec45655475897f1972 is a refactoring of the
Typeableimplementation which moves
Typeabledictionary generation from evidence generation time to the point where the represented type is defined. This tends to regress compile allocations by a few percent for programs defining lots of types (although programs which make large use of
Typeablemay see improvement).
- Improvements in code generation (#7450, b29633f5cf310824f3e34716e9261162ced779d3) and simplification (4681f55970cabc6e33591d7e698621580818f9a2)
GHC 8.0 to GHC 8.2
f53d761df9762232b54ec57a950d301011cd21f8 ("TysWiredIn: Use UniqFM lookup for built-in OccNames") improves the efficiency of built-in
OccNamelookup, resulting in a 2-5% improvement in compiler allocations on nofib. This was noticed due to unexpectedly large allocations regressions in dd3080fe0263082f65bf2570f49189c277b12e28 (#12357 (closed)).
ed4809813fa51524ae73a4475afe33018a67f87d ("InstEnv: Ensure that instance visibility check is lazy") fixes a bug introduced earlier change (4c834fdddf4d44d12039da4d6a2c63a660975b95) in the instance visibility check which broke laziness of instance resolution. This reduces compiler allocations by roughly 5-10% on nofib. The underlying bug was noticed by unexpectedly large allocation regressions due to 673efccb3b348e9daf23d9e65460691bbea8586e and 4e6bcc2c8134f9c1ba7d715b3206130f23c529fb, which added instances to various
basemodules (#12367 (closed)).
eb3d6595735671605c5d6294a796dc0f16f784a4 ("OccName: Avoid re-encoding derived OccNames") is a refactoring which reduces allocations in the computation of derived
OccNames by eliminating