... | ... | @@ -2,13 +2,7 @@ |
|
|
Notes about running demand analysis a second time, late in the pipeline.
|
|
|
|
|
|
|
|
|
Commit [c080f727ba5f83921b842fcff71e9066adbdc250](/trac/ghc/changeset/c080f727ba5f83921b842fcff71e9066adbdc250/ghc)
|
|
|
|
|
|
|
|
|
The numbers quoted on this wiki page were using [ef017944600cf4e153aad686a6a78bfb48dea67a](/trac/ghc/changeset/ef017944600cf4e153aad686a6a78bfb48dea67a/ghc) as the base commit — after measuring, I rebased my patch to apply it to [33c880b43ed72d77f6b1d95d5ccefbd376c78c78](/trac/ghc/changeset/33c880b43ed72d77f6b1d95d5ccefbd376c78c78/ghc)
|
|
|
|
|
|
|
|
|
The corresponding testsuite commit is [\[a7920ef6eefa5578c89b7cda0d6be207ee38c502/testsuite\]](/trac/ghc/changeset/a7920ef6eefa5578c89b7cda0d6be207ee38c502/testsuite)
|
|
|
Commits [c080f727ba5f83921b842fcff71e9066adbdc250](/trac/ghc/changeset/c080f727ba5f83921b842fcff71e9066adbdc250/ghc)[\[a7920ef6eefa5578c89b7cda0d6be207ee38c502/testsuite\]](/trac/ghc/changeset/a7920ef6eefa5578c89b7cda0d6be207ee38c502/testsuite)
|
|
|
|
|
|
## Commit notes
|
|
|
|
... | ... | @@ -22,7 +16,11 @@ The bulk of this patch merely simplifies the treatment of wrappers in interface |
|
|
|
|
|
- Update the documentation to explain -flate-dmd-anal.
|
|
|
|
|
|
- Ask the community for help in determining if we should make -O2 imply -flate-dmd-anal.
|
|
|
- Ask the performance czars and community for help in determining if we should make -O2 imply -flate-dmd-anal.
|
|
|
|
|
|
- That might involve investigating the more-reliable-looking slowdowns in the New performance numbers section. No slow down was apparent on both platforms (so far), but a couple looked reliable on a given platform. eg typecheck on the big server showed the same slowdown regardless of -flate-dmd-anal on the nofib tests (ie same in 10 and 11) and also regardless of mode=norm or mode=slow. Thus it smells like some change in a library function that the main loop of typecheck uses consistently is getting a slowdown. But it's very hard to tell from the numbers and it takes a lot of time to investigate that sort of thing.
|
|
|
|
|
|
- To proceed: perhaps measure mode=slow on the MacBook Pro. Also build the libraries with ticky on the big server to search for the hypothetical library function that is slowing down typecheck.
|
|
|
|
|
|
## Relation to other tickets
|
|
|
|
... | ... | @@ -49,6 +47,9 @@ Simplifying the .hi scheme was the easiest way to enable `-flate-dmd-anal` and m |
|
|
### Effect on .hi file size
|
|
|
|
|
|
|
|
|
The comparison in this section page uses [ef017944600cf4e153aad686a6a78bfb48dea67a](/trac/ghc/changeset/ef017944600cf4e153aad686a6a78bfb48dea67a/ghc) as the base commit — after measuring, I rebased my patch to apply it to [33c880b43ed72d77f6b1d95d5ccefbd376c78c78](/trac/ghc/changeset/33c880b43ed72d77f6b1d95d5ccefbd376c78c78/ghc)
|
|
|
|
|
|
|
|
|
Removing the clever .hi file scheme for wrappers results as expected in an increase of .hi file size.
|
|
|
|
|
|
|
... | ... | @@ -82,7 +83,7 @@ Here's the files with a growth \>10%. |
|
|
(0.21422422135168143,"ghc-prim/dist-install/build/GHC/Classes.hi")
|
|
|
```
|
|
|
|
|
|
### Accommodation of -flate-dmd-anal and -ffun-to-thunk --
|
|
|
### Main Benefit of Removal
|
|
|
|
|
|
|
|
|
The clever .hi scheme caused CoreLint errors when combined with -flate-dmd-anal. I irresponsibly cannot remember the recipe for this bug. It was triggered in one of three ways: building GHC, running nofib, or running ./validate.
|
... | ... | @@ -107,6 +108,9 @@ If demand analysis removes all the value arguments from a function f in A.hs and |
|
|
### Effect on .hi file size and .a file size
|
|
|
|
|
|
|
|
|
The comparison in this section page uses [ef017944600cf4e153aad686a6a78bfb48dea67a](/trac/ghc/changeset/ef017944600cf4e153aad686a6a78bfb48dea67a/ghc) as the base commit — after measuring, I rebased my patch to apply it to [33c880b43ed72d77f6b1d95d5ccefbd376c78c78](/trac/ghc/changeset/33c880b43ed72d77f6b1d95d5ccefbd376c78c78/ghc)
|
|
|
|
|
|
|
|
|
The second demand analysis generates more worker/wrapper splits, so it also generates larger .hi files and larger .o files. The numbers in this section measure the difference between `-O2 -flate-dmd-anal` and `-O2 -fno-late-dmd-anal`. This is on my 64 bit Mac OS X.
|
|
|
|
|
|
|
... | ... | @@ -206,6 +210,279 @@ These are the big .a changes over 10K. |
|
|
<th>libHSCabal-1.17.0.a
|
|
|
</th></tr></table>
|
|
|
|
|
|
### New performance numbers
|
|
|
|
|
|
|
|
|
These numbers in this section come from [c080f727ba5f83921b842fcff71e9066adbdc250](/trac/ghc/changeset/c080f727ba5f83921b842fcff71e9066adbdc250/ghc), building the libraries/nofib tests with various combinations of -fno-late-dmd-anal and -flate-dmd-anal.
|
|
|
|
|
|
|
|
|
I use these abbreviations in the following tables
|
|
|
|
|
|
```wiki
|
|
|
00 - no late dmd analysis on either libs or nofib tests
|
|
|
10 - late demand analysis on libs, but not on nofib tests
|
|
|
11 - late demand analysis on both libs and nofib tests
|
|
|
```
|
|
|
|
|
|
`build.mk` included
|
|
|
|
|
|
```wiki
|
|
|
SRC_HC_OPTS = -O -H64m
|
|
|
GhcStage1HcOpts = -O -fasm
|
|
|
GhcStage2HcOpts = -O2 -fasm
|
|
|
GhcHcOpts = -Rghc-timing
|
|
|
GhcLibHcOpts = -O2
|
|
|
|
|
|
SplitObjs = NO
|
|
|
|
|
|
DYNAMIC_BY_DEFAULT = NO
|
|
|
DYNAMIC_GHC_PROGRAMS = NO
|
|
|
```
|
|
|
|
|
|
|
|
|
The changes in binary size were the same on my two tests platforms so far (both 64-bit). It looks like essentially we're seeing the effects of an increase in the size of the base library. The smallest programs increased by +1.1% in both 10 and 11. Other programs usually had \~0.1% difference in 10 and 11. nucleic2 has about a +1% from 10 to 11, but that is a known anomaly — cf the discussion in "old performance numbers" below.
|
|
|
|
|
|
```wiki
|
|
|
Binary Sizes
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
-1 s.d. ----- +0.4% +0.4%
|
|
|
+1 s.d. ----- +0.7% +0.7%
|
|
|
Average ----- +0.6% +0.6%
|
|
|
```
|
|
|
|
|
|
#### 2.7Ghz Core i7 MacBook Pro, 16 GB, 64-bit
|
|
|
|
|
|
##### mode=norm NoFibRuns=30
|
|
|
|
|
|
```wiki
|
|
|
Allocations
|
|
|
|
|
|
-- NB nucleic2 and cryptarithm2 are explained in the "Old performance numbers" section below.
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
cichelli 80307264 +0.0% -22.9%
|
|
|
mandel2 1041544 +0.0% -21.4%
|
|
|
reverse-complem 150153040 -13.2% -13.2%
|
|
|
fasta 401153024 -9.1% -9.1%
|
|
|
integrate 474063360 +0.0% -5.1%
|
|
|
k-nucleotide 4125099504 -0.0% -4.8%
|
|
|
knights 1968072 +0.0% -3.8%
|
|
|
fulsom 323486224 +0.0% -2.6%
|
|
|
transform 696343224 +0.0% -2.4%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
nucleic2 87567072 +0.0% +3.4%
|
|
|
cryptarithm2 24028936 +0.0% +4.2%
|
|
|
|
|
|
-1 s.d. ----- -1.9% -4.8%
|
|
|
+1 s.d. ----- +1.5% +3.1%
|
|
|
Average ----- -0.2% -0.9%
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
Run Time
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
life 0.23 -13.0% -13.0%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
binary-trees 0.61 +6.3% +5.9%
|
|
|
|
|
|
-1 s.d. ----- -3.5% -4.1%
|
|
|
+1 s.d. ----- +2.9% +2.3%
|
|
|
Average ----- -0.4% -0.9%
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
Elapsed Time
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
compress2 0.23 -14.2% -17.7%
|
|
|
typecheck 0.20 +2.0% -8.9%
|
|
|
life 0.26 -12.3% -6.2%
|
|
|
simple 0.24 -9.0% -4.9%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
hpg 0.21 -1.9% +6.7%
|
|
|
reverse-complem 0.27 +13.5% +12.8%
|
|
|
|
|
|
-1 s.d. ----- -5.7% -5.6%
|
|
|
+1 s.d. ----- +4.2% +4.3%
|
|
|
Average ----- -0.9% -0.8%
|
|
|
```
|
|
|
|
|
|
#### really big many-core server, 48 GB, 64-bit
|
|
|
|
|
|
##### mode=norm NoFibRuns=30
|
|
|
|
|
|
```wiki
|
|
|
Allocations
|
|
|
|
|
|
-- NB nucleic2 and cryptarithm2 are explained in the "Old performance numbers" section below.
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
cichelli 80307264 +0.0% -22.9%
|
|
|
mandel2 1041544 +0.0% -21.4%
|
|
|
reverse-complem 150153040 -13.2% -13.2%
|
|
|
fasta 401153024 -9.1% -9.1%
|
|
|
integrate 474063360 +0.0% -5.1%
|
|
|
k-nucleotide 4125099504 -0.0% -4.8%
|
|
|
knights 1968072 +0.0% -3.8%
|
|
|
fulsom 323486224 +0.0% -2.6%
|
|
|
transform 696343224 +0.0% -2.4%
|
|
|
ida 128551480 +0.0% -1.2%
|
|
|
parstof 3102544 +0.0% -1.4%
|
|
|
simple 226411568 -0.0% -1.0%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
bspt 12285840 +0.0% +1.2%
|
|
|
nucleic2 87567496 +0.0% +3.4%
|
|
|
cryptarithm2 24028936 +0.0% +4.2%
|
|
|
|
|
|
-1 s.d. ----- -1.9% -4.8%
|
|
|
+1 s.d. ----- +1.5% +3.1%
|
|
|
Average ----- -0.2% -0.9%
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
Run Time
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
simple 0.27 -2.6% -6.4%
|
|
|
transform 0.39 -1.3% -5.1%
|
|
|
fasta 0.59 -2.5% -4.7%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
kahan 0.30 +3.6% +3.9%
|
|
|
binary-trees 0.88 +7.2% +6.9%
|
|
|
typecheck 0.24 +8.3% +8.3%
|
|
|
hidden 0.49 +4.1% +10.2%
|
|
|
|
|
|
-1 s.d. ----- -1.7% -3.0%
|
|
|
+1 s.d. ----- +2.9% +3.5%
|
|
|
Average ----- +0.6% +0.2%
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
Elapsed Time
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
simple 0.27 -2.6% -6.8%
|
|
|
transform 0.39 -1.3% -5.1%
|
|
|
fasta 0.59 -2.7% -3.7%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
binary-trees 0.88 +7.3% +6.9%
|
|
|
typecheck 0.24 +8.3% +8.3%
|
|
|
hidden 0.49 +4.1% +10.1%
|
|
|
|
|
|
-1 s.d. ----- -1.6% -2.9%
|
|
|
+1 s.d. ----- +3.1% +3.6%
|
|
|
Average ----- +0.7% +0.3%
|
|
|
```
|
|
|
|
|
|
##### mode=slow NoFibRuns=30
|
|
|
|
|
|
```wiki
|
|
|
Allocations
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
cichelli 80307264 +0.0% -22.9%
|
|
|
mandel2 1041544 +0.0% -21.4%
|
|
|
reverse-complem 1500677840 -13.2% -13.2%
|
|
|
fasta 4005660304 -9.1% -9.1%
|
|
|
integrate 948063920 +0.0% -5.1%
|
|
|
k-nucleotide 41144014840 +0.0% -4.9%
|
|
|
fulsom 323486224 +0.0% -2.6%
|
|
|
transform 1389145136 +0.0% -2.4%
|
|
|
genfft 1796463848 +0.0% -1.2%
|
|
|
ida 733628984 +0.0% -1.0%
|
|
|
parstof 3102544 +0.0% -1.4%
|
|
|
simple 226411568 -0.0% -1.0%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
bspt 12285840 +0.0% +1.2%
|
|
|
nucleic2 87567496 +0.0% +3.4%
|
|
|
cryptarithm2 24028936 +0.0% +4.2%
|
|
|
|
|
|
-1 s.d. ----- -1.9% -4.7%
|
|
|
+1 s.d. ----- +1.5% +3.1%
|
|
|
Average ----- -0.2% -0.9%
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
Run Time
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
mandel 0.22 -9.1% -9.1%
|
|
|
transform 0.80 -0.3% -8.7%
|
|
|
reverse-complem 1.39 -5.9% -6.1%
|
|
|
simple 0.26 -1.4% -5.2%
|
|
|
fasta 5.84 -3.9% -4.2%
|
|
|
gen_regexps 1.01 -4.6% -4.7%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
paraffins 1.00 +0.2% +3.4%
|
|
|
typecheck 0.49 +10.2% +8.2%
|
|
|
hidden 0.49 +4.1% +10.2%
|
|
|
|
|
|
-1 s.d. ----- -2.6% -3.3%
|
|
|
+1 s.d. ----- +2.9% +2.7%
|
|
|
Average ----- +0.1% -0.3%
|
|
|
```
|
|
|
|
|
|
```wiki
|
|
|
Elapsed Time
|
|
|
|
|
|
-------------------------------------------------------------------------------
|
|
|
Program 00 10 11
|
|
|
-------------------------------------------------------------------------------
|
|
|
mandel 0.22 -9.1% -9.1%
|
|
|
transform 0.80 +0.0% -8.5%
|
|
|
reverse-complem 1.39 -5.9% -5.8%
|
|
|
simple 0.27 -2.1% -5.2%
|
|
|
fasta 5.86 -3.9% -4.2%
|
|
|
gen_regexps 1.01 -4.5% -4.6%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
paraffins 1.00 +0.2% +3.7%
|
|
|
typecheck 0.49 +10.2% +8.2%
|
|
|
hidden 0.49 +4.5% +10.2%
|
|
|
|
|
|
-1 s.d. ----- -2.6% -3.2%
|
|
|
+1 s.d. ----- +2.9% +2.8%
|
|
|
Average ----- +0.1% -0.3%
|
|
|
```
|
|
|
|
|
|
### Old performance numbers
|
|
|
|
|
|
|
... | ... | @@ -223,6 +500,8 @@ Allocations |
|
|
cryptarithm2 25078168 +0.0% +8.0%
|
|
|
nucleic2 98331744 +0.0% +3.2%
|
|
|
|
|
|
-- everything else changed less
|
|
|
|
|
|
cichelli 80310632 +0.0% -22.9%
|
|
|
fasta 401159024 -9.1% -9.1%
|
|
|
fulsom 321427240 +0.0% -2.6%
|
... | ... | |