...
 
Commits (154)

Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.

{
"project.name" : "nofib",
"repository.callsign" : "NOFIB",
"phabricator.uri" : "https://phabricator.haskell.org"
}
# Generated file patterns
*.exe
*.o
*.hi
.depend
.depend.bak
cachegrind.out.*
cachegrind.out.summary
perf.data
perf.data.*
dist-newstyle/
.ghc.environment.*
# Specific generated files
nofib-analyse/nofib-analyse
runstdtest/runstdtest
imaginary/bernouilli/bernouilli
imaginary/digits-of-e1/digits-of-e1
imaginary/digits-of-e2/digits-of-e2
imaginary/exp3_8/exp3_8
imaginary/gen_regexps/gen_regexps
imaginary/integrate/integrate
......@@ -29,6 +36,13 @@ real/bspt/bspt
real/cacheprof/cacheprof
real/compress/compress
real/compress2/compress2
real/eff/CS/CS
real/eff/CSD/CSD
real/eff/FS/FS
real/eff/S/S
real/eff/VS/VS
real/eff/VSD/VSD
real/eff/VSM/VSM
real/fem/fem
real/fluid/fluid
real/fulsom/fulsom
......@@ -39,7 +53,8 @@ real/hidden/hidden
real/hpg/hpg
real/infer/infer
real/lift/lift
real/maillist/addresses.tex
real/linear/linear
real/maillist/runtime_files/*.tex
real/maillist/maillist
real/mkhprog/mkhprog
real/parser/parser
......@@ -60,16 +75,16 @@ shootout/fasta/fasta.slowstdout
shootout/fasta/fasta.stdout
shootout/k-nucleotide/fasta-c
shootout/k-nucleotide/k-nucleotide
shootout/k-nucleotide/knucleotide-input250000.txt
shootout/k-nucleotide/knucleotide-input2500000.txt
shootout/k-nucleotide/knucleotide-input25000000.txt
shootout/k-nucleotide/k-nucleotide.faststdin
shootout/k-nucleotide/k-nucleotide.slowstdin
shootout/k-nucleotide/k-nucleotide.stdin
shootout/n-body/n-body
shootout/pidigits/pidigits
shootout/reverse-complement/fasta-c
shootout/reverse-complement/revcomp-c
shootout/reverse-complement/revcomp-input250000.txt
shootout/reverse-complement/revcomp-input2500000.txt
shootout/reverse-complement/revcomp-input25000000.txt
shootout/reverse-complement/reverse-complement.faststdin
shootout/reverse-complement/reverse-complement.slowstdin
shootout/reverse-complement/reverse-complement.stdin
shootout/reverse-complement/reverse-complement
shootout/reverse-complement/reverse-complement.faststdout
shootout/reverse-complement/reverse-complement.slowstdout
......@@ -79,7 +94,11 @@ shootout/spectral-norm/spectral-norm
spectral/ansi/ansi
spectral/atom/atom
spectral/awards/awards
spectral/awards/*.stdout
spectral/awards/*.slowstdout
spectral/banner/banner
spectral/banner/*stdout
spectral/banner/*stdin
spectral/boyer/boyer
spectral/boyer2/boyer2
spectral/calendar/calendar
......@@ -90,7 +109,9 @@ spectral/constraints/constraints
spectral/cryptarithm1/cryptarithm1
spectral/cryptarithm2/cryptarithm2
spectral/cse/cse
spectral/dom-lt/dom-lt
spectral/eliza/eliza
spectral/exact-reals/exact-reals
spectral/expert/expert
spectral/fft2/fft2
spectral/fibheaps/fibheaps
......@@ -113,10 +134,13 @@ spectral/hartel/wang/wang
spectral/hartel/wave4main/wave4main
spectral/integer/integer
spectral/knights/knights
spectral/lambda/lambda
spectral/last-piece/last-piece
spectral/lcss/lcss
spectral/life/life
spectral/mandel/mandel
spectral/mandel2/mandel2
spectral/mate/mate
spectral/minimax/minimax
spectral/multiplier/multiplier
spectral/para/para
......@@ -143,6 +167,7 @@ gc/mutstore1/mutstore1
gc/mutstore2/mutstore2
gc/power/power
gc/spellcheck/spellcheck
gc/treejoin/treejoin
parallel/blackscholes/blackscholes
parallel/coins/coins
......
variables:
DOCKER_REV: 2b69e99de97bd5bf1fbdbf45852231c3dcb602b6
validate:
image: "registry.gitlab.haskell.org/ghc/ci-images/x86_64-linux-deb9:$DOCKER_REV"
tags:
- x86_64-linux
before_script:
- git clean -xdf
- sudo apt install -y time
- ghc --version
- cabal --version
script:
- make clean
- cabal update
- make boot mode=fast
- "make mode=fast NoFibRuns=1 2>&1 | tee log"
- "nofib-analyse/nofib-analyse log"
- |
# The following checks that `make distclean` removes any files reported
# by `git clean -fxd`
make distclean
files=$(git clean -nxd | cut -d" " -f3 | sed "/log/d")
if ! [ -z $files ]
then
echo "The following files weren't cleaned:\n$files"
exit 1
fi
# Syntax: https://docs.gitlab.com/ee/user/project/code_owners.html
* @sgraf812 @bgamari
\ No newline at end of file
......@@ -3,33 +3,69 @@
This is the root directory of the "NoFib Haskell benchmark suite". It
should be part of a GHC source tree, that is the 'nofib' directory
should be at the same level in the tree as 'compiler' and 'libraries'.
This makes sure that NoFib picks up the stage 2 compiler from the
surrounding GHC source tree.
## Package Depedencies
You can also clone this repository in isolation, in which case it will
pick `$(which ghc)` or whatever the `HC` environment variable is set to.
Please make sure you have the following packages installed for your
system GHC:
* html
* regex-compat (will install: mtl, regex-base, regex-posix)
Additional information can also be found on
[NoFib's wiki page](https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib).
There's also a `easy.sh` helper script, which as name implies, is
automated and easy way to run `nofib`.
See the section at the end of README for its usage.
## Using
<details>
<summary>Git symlink support for Windows machines</summary>
NoFib uses a few symlinks here and there to share code between benchmarks.
Git for Windows has symlinks support for some time now, but
[it may not be enabled by default](https://stackoverflow.com/a/42137273/388010).
You will notice strange `make boot` failures if it's not enabled for you.
Make sure you follow the instructions in the link to enable symlink support,
possibly as simple as through `git config core.symlinks true` or cloning with
`git clone -c core.symlinks=true <URL>`.
</details>
Install [`cabal-install-2.4`](https://www.haskell.org/cabal/download.html) or later.
Then, to run the tests, execute:
make clean
make boot
make 2>&1 | tee nofib-log
```
$ make clean # or git clean -fxd, it's faster
$ # Generates input files for the benchmarks and builds compilation
$ # dependencies for make (ghc -M)
$ make boot
$ # Builds the benchmarks and runs them $NoFibRuns (default: 5) times
$ make
```
This will put the results in the file `nofib-log`. You can pass extra
options to a nofib run using the `EXTRA_HC_OPTS` variable like this:
make clean
make boot
make EXTRA_HC_OPTS="-fllvm" >&1 | tee nofib-llvm-log
```
$ make clean
$ make boot
$ make EXTRA_HC_OPTS="-fllvm"
```
**Note:** to get all the results, you have to `clean` and `boot` between
separate `nofib` runs.
To compare the results of multiple runs, use the program in
`../utils/nofib-analyse`, for example:
To compare the results of multiple runs, save the output in a logfile
and use the program in `./nofib-analyse/nofib-analyse`, for example:
nofib-analyse nofib-log-6.4.2 nofib-log-6.6
```
...
$ make 2>&1 | tee nofib-log-6.4.2
...
$ make 2>&1 | tee nofib-log-6.6
$ nofib-analyse nofib-log-6.4.2 nofib-log-6.6 | less
```
to generate a comparison of the runs in captured in `nofib-log-6.4.2`
and `nofib-log-6.6`. When making comparisons, be careful to ensure
......@@ -39,6 +75,55 @@ GHC version, GCC version, C libraries, static vs. dynamic GMP library,
build options, run options, and probably lots more. To be on the safe
side, make both runs on the same unloaded machine.
## Modes
Each benchmark is runnable in three different time `mode`s:
- `fast`: 0.1-0.2s
- `norm`: 1-2s
- `slow`: 5-10s
You can control which mode to run by setting an additional `mode` variable for
`make`. The default is `mode=norm`. Example for `mode=fast`:
```
$ make clean
$ make boot mode=fast
$ make mode=fast
```
Note that the `mode`s set in `make boot` and `make` need to agree. Otherwise you
will get output errors, because `make boot` will generate input files for a
different `mode`. A more DRY way to control the `mode` would be
```
$ make clean
$ export mode=fast
$ make boot
$ make
```
As CPU architectures advance, the above running times may drift and
occasionally, all benchmarks will need adjustments.
Be aware that `nofib-analyse` will ignore the result if it falls below 0.2s.
This is the default of its `-i` option, which is of course incompatible with
`mode=fast`. In that case, you should just set `-i` as appropriate, even
deactivate it with `-i 0`.
## Boot vs. benchmarked GHC
The `nofib-analyse` utility is compiled with `BOOT_HC` compiler,
which may be different then the GHC under the benchmark.
You can control which GHC you benchmark with `HC` variable
```
$ make clean
$ make boot HC=ghc-head
$ make HC=ghc-head 2>&1 | tee nofib-log-ghc-head
```
## Configuration
There are some options you might want to tweak; search for nofib in
......@@ -50,9 +135,99 @@ To get instruction counts, memory reads/writes, and "cache misses",
you'll need to get hold of Cachegrind, which is part of
[Valgrind](http://valgrind.org).
You can then pass `-cachegrind` as `EXTRA_RUNTEST_OPTS`. Counting
instructions slows down execution by a factor of ~30. But it's
a deterministic metric, so you can combine it with `NoFibRuns=1`:
```
$ (make EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1) 2>&1 | tee nofib-log
```
Optionally combine this with `mode=fast`, see [Modes](#modes).
## Extra Packages
Some benchmarks aren't run by default and require extra packages are
installed for the GHC compiler being tested. These packages include:
* stm - for smp benchmarks
## Adding benchmarks
If you add a benchmark try to set the problem sizes for
fast/normal/slow reasonably. [Modes](#modes) lists the recommended brackets for
each mode.
### Benchmark Categories
So you have a benchmark to submit but don't know in which subfolder to put it? Here's some
advice on the intended semantics of each category.
#### Single threaded benchmarks
These are run when you just type `make`. Their semantics is explained in
[the Nofib paper](https://link.springer.com/chapter/10.1007%2F978-1-4471-3215-8_17)
(You can find a .ps online, thanks to @bgamari. Alternatively grep for
'Spectral' in docs/paper/paper.verb).
- `imaginary`: Mostly toy benchmarks, solving puzzles like n-queens.
- `spectral`: Algorithmic kernels, like FFT. If you want to add a benchmark of a
library, this most certainly the place to put it.
- `real`: Actual applications, with a command-line interface and all. Because of
the large dependency footprint of today's applications, these have become
rather aged.
- `shootout`: Benchmarks from
[the benchmarks game](https://benchmarksgame-team.pages.debian.net/benchmarksgame/),
formerly known as "language shootout".
Most of the benchmarks are quite old and aren't really written in way one would
write high-performance Haskell code today (e.g., use of `String`, lists,
redefining own list combinators that don't take part in list fusion, rare use of
strictness annotations or unboxed data), so new benchmarks for the `real` and
`spectral` in brackets in particular are always welcome!
#### Other categories
Other than the default single-threaded categories above, there are the
following (SG: I'm guessing here, have never run them):
- `gc`: Run by `make -C gc` (though you'll probably have to edit the Makefile to
your specific config). Select benchmarks from `spectral` and `real`, plus a
few more (Careful, these have not been touched by #15999/!5, see the next
subsection). Testdrives different GC configs, apparently.
- `smp`: Microbenchmarks for the `-threaded` runtime, measuring scheduler
performance on concurrent and STM-heavy code.
### Stability wrt. GC paramerisations
Additionally, pay attention that your benchmarks are stable wrt. different
GC paramerisations, so that small changes in allocation don't lead to big,
unexplicable jumps in performance. See #15999 for details. Also make sure
that you run the benchmark with the default GC settings, as enlarging Gen 0 or
Gen 1 heaps just amplifies the problem.
As a rule of thumb on how to ensure this: Make sure that your benchmark doesn't
just build up one big data and consume it in a final step, but rather that the
working set grows and shrinks (e.g. is approximately constant) over the whole
run of the benchmark. You can ensure this by iterating your main logic $n times
(how often depends on your program, but in the ball park of 100-1000).
You can test stability by plotting productivity curves for your `fast` settings
with the `prod.py` script attached to #15999.
If in doubt, ask Sebastian Graf for help.
## easy.sh
```
./easy.sh - easy nofib
Usage: ./easy.sh [ -m mode ] /path/to/baseline/ghc /path/to/new/ghc"
GHC paths can point to the root of the GHC repository,
if it's build with Hadrian.
Available options:
-m MODE nofib mode: fast norm slow
This script caches the results using the sha256 of ghc executable.
Remove these files, if you want to rerun the benchmark.
```
......@@ -13,6 +13,15 @@ whereas it didn't before. So allocations go up a bit.
Imaginary suite
---------------------------------------
queens
~~~~~~
The comprehension
gen n = [ (q:b) | b <- gen (n-1), q <- [1..nq], safe q 1 b]
has, for each iteration of 'b', a new list [1..nq]. This can floated
and hence and shared, or fused. It's quite delicate which of the two
happens.
integrate
~~~~~~~~~
integrate1D is strict in its second argument 'u', but it also passes 'u' to
......@@ -21,7 +30,7 @@ slightly.
gen_regexps
~~~~~~~~~~~
I found that there were some very bad loss-of-arity cases in PrelShow.
I found that there were some very bad loss-of-arity cases in PrelShow.
In particular, we had:
showl "" = showChar '"' s
......@@ -46,7 +55,7 @@ I found that there were some very bad loss-of-arity cases in PrelShow.
So I've changed PrelShow.showLitChar to use explicit \s. Even then, showl
doesn't work, because GHC can't see that showl xs can be pushed inside the \s.
So I've put an explict \s there too.
So I've put an explict \s there too.
showl "" s = showChar '"' s
showl ('"':xs) s = showString "\\\"" (showl xs s)
......@@ -54,6 +63,14 @@ I found that there were some very bad loss-of-arity cases in PrelShow.
Net result: imaginary/gen_regexps more than halves in allocation!
queens
~~~~~~
If we do
a) some inlining before float-out
b) fold/build fusion before float-out
then queens get 40% more allocation. Presumably the fusion
prevents sharing.
x2n1
~~~~
......@@ -73,7 +90,7 @@ It's important to inline p_ident.
There's a very delicate CSE in p_expr
p_expr = seQ q_op [p_term1, p_op, p_term2] ## p_term3
(where all the pterm1,2,3 are really just p_term).
(where all the pterm1,2,3 are really just p_term).
This expands into
p_expr s = case p_term1 s of
......@@ -103,7 +120,7 @@ like this:
xs7_s1i8 :: GHC.Prim.Int# -> [GHC.Base.Char]
[Str: DmdType]
xs7_s1i8 = go_r1og ys_aGO
} in
} in
\ (m_XWf :: GHC.Prim.Int#) ->
case GHC.Prim.<=# m_XWf 1 of wild1_aSI {
GHC.Base.False ->
......@@ -114,20 +131,36 @@ like this:
Notice the 'let' which stops the lambda moving out.
Eliza
eliza
~~~~~
In June 2002, GHC 5.04 emitted four successive
NOTE: Simplifier still going after 4 iterations; bailing out.
messages. I suspect that the simplifer is looping somehow.
messages. I suspect that the simplifier is looping somehow.
fibheaps
~~~~~~~~
If you don't inline getChildren, allocation rises by 25%
hartel/event
~~~~~~~~~~~~
There's a functions called f_nand and f_d, which generates tons of
code if you inline them too vigorously. And this can happen because
of a massive result discount.
Moreover if f_d gets inlined too much, you get lots of local lvl_xx
things which make some closures have lots of free variables, which pushes
up allocation.
Expert
expert
~~~~~~
In spectral/expert/Search.ask there's a statically visible CSE. Catching this
In spectral/expert/Search.ask there's a statically visible CSE. Catching this
depends almost entirely on chance, which is a pity.
reptile
~~~~~~~
Performance dominated by (++) and Show.itos'
Fish
fish
~~~~
The performance of fish depends crucially on inlining scale_vec2.
It turns out to be right on the edge of GHC's normal threshold size, so
......@@ -203,29 +236,51 @@ We would do better to inpline showsPrec9 but it looks too big. Before
it was inlined regardless by the instance-decl stuff. So perf drops slightly.
Integer
integer
~~~~~~~
A good benchmark for beating on big-integer arithmetic
There is a delicate interaction of fusion and full laziness in the comprehension
integerbench :: (Integer -> Integer -> a)
-> Integer -> Integer -> Integer
-> Integer -> Integer -> Integer
-> IO ()
integerbench op astart astep alim bstart bstep blim = do
seqlist ([ a `op` b
| a <- [ astart,astart+astep..alim ]
, b <- [ bstart,astart+bstep..blim ]])
return ()
and the analogous one for Int.
Since the inner loop (for b) doesn't depend on a, we could float the
b-list out; but it may fuse first. In GHC 8 (and most previous
version) this fusion did happen at type Integer, but (accidentally) not for
Int because an interving eval got in the way. So the b-enumeration was floated
out, which led to less allocation of Int values.
Knights
~~~~~~~
In knights/KnightHeuristic, we don't find that possibleMoves is strict
(with important knock-on effects) unless we apply rules before floating
out the literal list [A,B,C...].
Similarly, in f_se (F_Cmp ...) in listcompr (but a smaller effect)
* In knights/KnightHeuristic, we don't find that possibleMoves is strict
(with important knock-on effects) unless we apply rules before floating
out the literal list [A,B,C...].
* Similarly, in f_se (F_Cmp ...) in listcompr (but a smaller effect)
Lambda
* If we don't inline $wmove, we get an allocation increase of 17%
lambda
~~~~~~