Huge increase of compile time and memory use from 8.0.2 to 8.2.2 or 8.4.2
I am the author of the cl3 library on Hackage. I have noticed a huge increase of compile time and memory use when testing 8.2.2 and 8.4.2. ghc-8.0.2 compiled in 4:17.33 using 3.5 GB. ghc-8.2.2 compiled in 26:40.15 using 32.8 GB. This is an increase of 6x in time and 9x in memory. This is not all bad, my nbody benchmark has improved about 35% between ghc-8.0.2 and ghc-8.4.2 so the increased compilation time and memory usage are producing much better runtime performance. I am interested if you could suggest some workarounds to help others compile on systems with less resources. I have 64GB memory in my system and would like to test out some -fno-* GHC Options. Could you point me in the right direction? The library is almost entirely pure functions. I am also interested in other options, like if there are ways to rewrite things to make it easier on the compiler or using NOINLINE on a trouble spot and how to find that trouble spot.
Trac metadata
Trac field | Value |
---|---|
Version | 8.4.2 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- NathanWaivio changed milestone to %8.6.1
changed milestone to %8.6.1
- NathanWaivio changed weight to 5
changed weight to 5
- NathanWaivio added Tbug Trac import labels
added Tbug Trac import labels
- Developer
A quick first thing to do is to run
ghc
with-v
. It will print statistics about each core-to-core pass (size of the AST, and in recent versions memory consumption), and maybe you can spot one pass where the size of the AST drastically increases. - Author
Attached file
ghc-8.2.2-v.txt
(download).verbose output of ghc-8.2.2
- Author
All of the numbers look pretty big to me.
- Author
Attached file
ghc-8.0.2-v.txt
(download).verbose output of ghc-8.0.2
- Author
I have now been able to analyze the differences between ghc-8.0.2 to ghc-8.2.2.
Here is a table of the growth of the number of terms per iteration of the Simplifier for the module Algebra.Geometric.Cl3:
Simplifier Iteration 1 2 3 4 ghc-8.0.2 118,835 138,330 172,291 516,767 ghc-8.2.2 149,046 185,066 15,190,006 12,310,166 Apparently the largest difference occurs in Simplifier Iteration 3 between the Worker Wrapper and Float Out.
What is going on with Simplifier Iteration 3?
The other module, Algebra.Geometric.Cl3.JonesCalculus, actually has fewer terms in ghc-8.2.2 than ghc-8.0.2.
- Author
Attached file
ghc-8.4.2-v.txt
(download).Attached verbose output of ghc-8.4.2. Noticed that while compiling the memory usage increased from 10GB to 32GB during CodeGen.
- Ben Gamari changed weight to 7
changed weight to 7
- Maintainer
Trac metadata
Trac field Value Priority normal → high - Tobias Dammers assigned to @tdammers
assigned to @tdammers
- Developer
Steps to reproduce:
- Check out from git:
git clone https://github.com/waivio/cl3.git; cd cl3
- Install the
random
package (e.g.cabal install random
) cd cl3
ghc -c Algebra/Geometric/Cl3.hs -fforce-recomp -O2
Without
-O2
, things compile in under .1 seconds on my machine; adding-O2
makes it take minutes. - Check out from git:
- Developer
It seems that the
Eq
instance is at the core of the problem: replacing it with the following allows GHC to compile the module in a timely fashion, and compilation succeeds even with a heap limit of only 128M:instance Eq Cl3 where (==) = undefined
This instance also compiles fast:
instance Eq Cl3 where (R a0) == (R b0) = True _ == _ = undefined
But this one seems to blow up:
instance Eq Cl3 where (R a0) == (R b0) = a0 == b0 _ == _ = undefined
- Developer
Further investigation reveals that it's not necessarily the
Eq
instance itself that causes trouble; commenting out various parts of the module allows me to make it blow up or not blow up even with the full originalEq
in place. It seems that multiple factors contribute to the blowup, and removing enough of them "fixes" the problem. - Developer
Here's
-dshow-passes
with HEAD-dshow-passes output
simonpj@cam-05-unx:~/tmp/cl3$ ~/5builds/HEAD-5/inplace/bin/ghc-stage2 -c -dshow-passes src/Algebra/Geometric/Cl3.hs -O2 Glasgow Haskell Compiler, Version 8.7.20180710, stage 2 booted by GHC version 8.2.2 Using binary package database: /home/simonpj/5builds/HEAD-5/inplace/lib/package.conf.d/package.cache Using binary package database: /home/simonpj/.ghc/x86_64-linux-8.7.20180710/package.conf.d/package.cache package flags [] loading package database /home/simonpj/5builds/HEAD-5/inplace/lib/package.conf.d loading package database /home/simonpj/.ghc/x86_64-linux-8.7.20180710/package.conf.d wired-in package ghc-prim mapped to ghc-prim-0.5.3 wired-in package integer-gmp mapped to integer-gmp-1.0.2.0 wired-in package base mapped to base-4.12.0.0 wired-in package rts mapped to rts wired-in package template-haskell mapped to template-haskell-2.14.0.0 wired-in package ghc mapped to ghc-8.7 *** Checking old interface for Algebra.Geometric.Cl3 (use -ddump-hi-diffs for more details): *** Parser [Algebra.Geometric.Cl3]: !!! Parser [Algebra.Geometric.Cl3]: finished in 281.59 milliseconds, allocated 124.657 megabytes *** Renamer/typechecker [Algebra.Geometric.Cl3]: !!! Renamer/typechecker [Algebra.Geometric.Cl3]: finished in 1729.18 milliseconds, allocated 582.499 megabytes *** Desugar [Algebra.Geometric.Cl3]: Result size of Desugar (before optimization) = {terms: 46,524, types: 55,087, coercions: 2,210, joins: 0/9,539} Result size of Desugar (after optimization) = {terms: 26,825, types: 34,697, coercions: 4,390, joins: 1/660} !!! Desugar [Algebra.Geometric.Cl3]: finished in 463.17 milliseconds, allocated 198.217 megabytes *** Simplifier [Algebra.Geometric.Cl3]: Result size of Simplifier iteration=1 = {terms: 29,453, types: 35,248, coercions: 7,271, joins: 1/982} Result size of Simplifier iteration=2 = {terms: 26,430, types: 32,412, coercions: 5,036, joins: 1/207} Result size of Simplifier iteration=3 = {terms: 26,370, types: 32,315, coercions: 4,924, joins: 1/198} Result size of Simplifier = {terms: 26,370, types: 32,315, coercions: 4,924, joins: 1/198} !!! Simplifier [Algebra.Geometric.Cl3]: finished in 1478.89 milliseconds, allocated 532.541 megabytes *** Specialise [Algebra.Geometric.Cl3]: Result size of Specialise = {terms: 27,077, types: 33,084, coercions: 4,924, joins: 1/226} !!! Specialise [Algebra.Geometric.Cl3]: finished in 34.41 milliseconds, allocated 20.874 megabytes *** Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [Algebra.Geometric.Cl3]: Result size of Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) = {terms: 29,888, types: 35,057, coercions: 4,924, joins: 1/217} !!! Float out(FOS {Lam = Just 0, Consts = True, OverSatApps = False}) [Algebra.Geometric.Cl3]: finished in 303.86 milliseconds, allocated 133.719 megabytes *** Simplifier [Algebra.Geometric.Cl3]: Result size of Simplifier iteration=1 = {terms: 109,632, types: 50,922, coercions: 4,826, joins: 183/7,875} Result size of Simplifier iteration=2 = {terms: 93,026, types: 52,819, coercions: 4,899, joins: 185/1,646} Result size of Simplifier iteration=3 = {terms: 135,959, types: 55,173, coercions: 4,892, joins: 99/2,772} Result size of Simplifier iteration=4 = {terms: 131,354, types: 52,485, coercions: 4,892, joins: 53/529} Result size of Simplifier = {terms: 131,354, types: 52,485, coercions: 4,892, joins: 53/529} !!! Simplifier [Algebra.Geometric.Cl3]: finished in 4415.46 milliseconds, allocated 1573.215 megabytes *** Simplifier [Algebra.Geometric.Cl3]: Result size of Simplifier iteration=1 = {terms: 130,205, types: 52,159, coercions: 4,892, joins: 37/519} Result size of Simplifier iteration=2 = {terms: 128,591, types: 51,440, coercions: 4,892, joins: 37/513} Result size of Simplifier = {terms: 128,591, types: 51,440, coercions: 4,892, joins: 37/513} !!! Simplifier [Algebra.Geometric.Cl3]: finished in 3285.00 milliseconds, allocated 1248.401 megabytes *** Simplifier [Algebra.Geometric.Cl3]: Result size of Simplifier iteration=1 = {terms: 129,119, types: 51,615, coercions: 4,892, joins: 37/538} Result size of Simplifier iteration=2 = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533} Result size of Simplifier = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533} !!! Simplifier [Algebra.Geometric.Cl3]: finished in 3415.70 milliseconds, allocated 1218.423 megabytes *** Float inwards [Algebra.Geometric.Cl3]: Result size of Float inwards = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533} !!! Float inwards [Algebra.Geometric.Cl3]: finished in 218.06 milliseconds, allocated 143.267 megabytes *** Called arity analysis [Algebra.Geometric.Cl3]: Result size of Called arity analysis = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533} !!! Called arity analysis [Algebra.Geometric.Cl3]: finished in 160.66 milliseconds, allocated 86.793 megabytes *** Simplifier [Algebra.Geometric.Cl3]: Result size of Simplifier = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533} !!! Simplifier [Algebra.Geometric.Cl3]: finished in 943.43 milliseconds, allocated 405.961 megabytes *** Demand analysis [Algebra.Geometric.Cl3]: Result size of Demand analysis = {terms: 129,068, types: 51,555, coercions: 4,892, joins: 37/533} !!! Demand analysis [Algebra.Geometric.Cl3]: finished in 1513.03 milliseconds, allocated 552.224 megabytes *** Worker Wrapper binds [Algebra.Geometric.Cl3]: Result size of Worker Wrapper binds = {terms: 130,487, types: 54,247, coercions: 4,892, joins: 38/701} !!! Worker Wrapper binds [Algebra.Geometric.Cl3]: finished in 36.37 milliseconds, allocated 10.774 megabytes *** Simplifier [Algebra.Geometric.Cl3]: Result size of Simplifier iteration=1 = {terms: 143,685, types: 58,766, coercions: 4,815, joins: 96/1,750} Result size of Simplifier iteration=2 = {terms: 175,579, types: 63,298, coercions: 4,815, joins: 173/1,293} C-c C-c*** Deleting temp files: *** Deleting temp dirs:
I had to stop it with ctrl-C
Edited by Ömer Sinan Ağacan - Developer
Attached file
Cl3.hs
(download). - Developer
Based on ticket:15304#comment:156492, I attached a bit smaller version without dependency on random.
On my machine:
-
ghc-8.0.2 -O -c Cl3.hs -fforce-recomp
takes 2 secs -
ghc-8.2.1 -O -c Cl3.hs -fforce-recomp
takes 57 secs -
ghc-8.4.3 -O -c Cl3.hs -fforce-recomp
takes 95 secs
If I don't add
-O
, every version takes 2 secs. -
- Developer
Great, thanks a lot for that. 95 seconds is a lot more manageable.
- Developer
You can make the process faster by removing methods from
instance Floating Cl3
- it seems that every one of the 12 methods slows down the compilation by few seconds. For example, if you keepexp
andlog
only it's 22 sec (and the root cause should still be present).