README.md 8.91 KB
Newer Older
1 2 3
# NoFib: Haskell Benchmark Suite

This is the root directory of the "NoFib Haskell benchmark suite". It
4 5
should be part of a GHC source tree, that is the 'nofib' directory
should be at the same level in the tree as 'compiler' and 'libraries'.
Sebastian Graf's avatar
Sebastian Graf committed
6 7 8 9 10 11 12 13
This makes sure that NoFib picks up the stage 2 compiler from the
surrounding GHC source tree.

You can also clone this repository in isolation, in which case it will
pick `$(which ghc)` or whatever the `HC` environment variable is set to.

Additional information can also be found on
[NoFib's wiki page](https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib).
14

Oleg Grenrus's avatar
Oleg Grenrus committed
15 16 17 18
There's also a `easy.sh` helper script, which as name implies, is
automated and easy way to run `nofib`.
See the section at the end of README for its usage.

19 20
## Using

21 22
<details>
  <summary>Git symlink support for Windows machines</summary>
Oleg Grenrus's avatar
Oleg Grenrus committed
23

24 25 26 27
  NoFib uses a few symlinks here and there to share code between benchmarks.
  Git for Windows has symlinks support for some time now, but
  [it may not be enabled by default](https://stackoverflow.com/a/42137273/388010).
  You will notice strange `make boot` failures if it's not enabled for you.
Oleg Grenrus's avatar
Oleg Grenrus committed
28

29 30 31 32 33
  Make sure you follow the instructions in the link to enable symlink support,
  possibly as simple as through `git config core.symlinks true` or cloning with
  `git clone -c core.symlinks=true <URL>`.
</details>

34
Install [`cabal-install-2.4`](https://www.haskell.org/cabal/download.html) or later.
35 36 37 38

Then, to run the tests, execute:

```
Sebastian Graf's avatar
Sebastian Graf committed
39 40 41 42 43 44 45
$ make clean # or git clean -fxd, it's faster
$ # Generates input files for the benchmarks and builds compilation
$ # dependencies for make (ghc -M)
$ make boot
$ # Builds the benchmarks and runs them $NoFibRuns (default: 5) times
$ make
```
46

tibbe's avatar
tibbe committed
47 48
This will put the results in the file `nofib-log`. You can pass extra
options to a nofib run using the `EXTRA_HC_OPTS` variable like this:
49

Sebastian Graf's avatar
Sebastian Graf committed
50 51 52 53 54
```
$ make clean
$ make boot
$ make EXTRA_HC_OPTS="-fllvm"
```
55

56 57 58
**Note:** to get all the results, you have to `clean` and `boot` between
separate `nofib` runs.

Sebastian Graf's avatar
Sebastian Graf committed
59
To compare the results of multiple runs, save the output in a logfile
60
and use the program in `./nofib-analyse/nofib-analyse`, for example:
61

Sebastian Graf's avatar
Sebastian Graf committed
62 63 64 65 66 67 68
```
...
$ make 2>&1 | tee nofib-log-6.4.2
...
$ make 2>&1 | tee nofib-log-6.6
$ nofib-analyse nofib-log-6.4.2 nofib-log-6.6 | less
```
69

tibbe's avatar
tibbe committed
70 71
to generate a comparison of the runs in captured in `nofib-log-6.4.2`
and `nofib-log-6.6`. When making comparisons, be careful to ensure
72
that the things that changed between the builds are only the things
73
that you _wanted_ to change. There are lots of variables: machine,
74
GHC version, GCC version, C libraries, static vs. dynamic GMP library,
75
build options, run options, and probably lots more. To be on the safe
76
side, make both runs on the same unloaded machine.
77

Sebastian Graf's avatar
Sebastian Graf committed
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
## Modes

Each benchmark is runnable in three different time `mode`s:

- `fast`: 0.1-0.2s
- `norm`: 1-2s
- `slow`: 5-10s

You can control which mode to run by setting an additional `mode` variable for
`make`. The default is `mode=norm`. Example for `mode=fast`:

```
$ make clean
$ make boot mode=fast
$ make mode=fast
```

Note that the `mode`s set in `make boot` and `make` need to agree. Otherwise you
will get output errors, because `make boot` will generate input files for a
different `mode`. A more DRY way to control the `mode` would be

```
$ make clean
$ export mode=fast
$ make boot
$ make
```

As CPU architectures advance, the above running times may drift and
occasionally, all benchmarks will need adjustments.

Be aware that `nofib-analyse` will ignore the result if it falls below 0.2s.
This is the default of its `-i` option, which is of course incompatible with
`mode=fast`. In that case, you should just set `-i` as appropriate, even
deactivate it with `-i 0`.

114 115 116 117 118 119 120 121 122 123 124 125 126
## Boot vs. benchmarked GHC

The `nofib-analyse` utility is compiled with `BOOT_HC` compiler,
which may be different then the GHC under the benchmark.

You can control which GHC you benchmark with `HC` variable

```
$ make clean
$ make boot HC=ghc-head
$ make HC=ghc-head 2>&1 | tee nofib-log-ghc-head
```

127 128 129
## Configuration

There are some options you might want to tweak; search for nofib in
tibbe's avatar
tibbe committed
130
`../mk/config.mk`, and override settings in `../mk/build.mk` as usual.
131 132 133

## Extra Metrics: Valgrind

134
To get instruction counts, memory reads/writes, and "cache misses",
tibbe's avatar
tibbe committed
135 136
you'll need to get hold of Cachegrind, which is part of
[Valgrind](http://valgrind.org).
137

Sebastian Graf's avatar
Sebastian Graf committed
138 139 140 141 142 143 144 145 146 147
You can then pass `-cachegrind` as `EXTRA_RUNTEST_OPTS`. Counting
instructions slows down execution by a factor of ~30. But it's
a deterministic metric, so you can combine it with `NoFibRuns=1`:

```
$ (make EXTRA_RUNTEST_OPTS="-cachegrind" NoFibRuns=1) 2>&1 | tee nofib-log
```

Optionally combine this with `mode=fast`, see [Modes](#modes).

148
## Extra Packages
dterei's avatar
dterei committed
149

150 151
Some benchmarks aren't run by default and require extra packages are
installed for the GHC compiler being tested. These packages include:
Ben Gamari's avatar
Ben Gamari committed
152 153 154

 * `old-time`: for `gc` benchmarks
 * `stm`: for smp benchmarks
Ben Gamari's avatar
Ben Gamari committed
155 156 157 158 159 160 161 162
 * `parallel`: for parallel benchmarks
 * `random`: for various benchmarks

These can be installed with

```
cabal v1-install --allow-newer -w $HC random parallel old-time
````
dterei's avatar
dterei committed
163

164 165 166
## Adding benchmarks

If you add a benchmark try to set the problem sizes for
Sebastian Graf's avatar
Sebastian Graf committed
167 168 169
fast/normal/slow reasonably. [Modes](#modes) lists the recommended brackets for
each mode.

170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209
### Benchmark Categories

So you have a benchmark to submit but don't know in which subfolder to put it? Here's some
advice on the intended semantics of each category.

#### Single threaded benchmarks

These are run when you just type `make`. Their semantics is explained in
[the Nofib paper](https://link.springer.com/chapter/10.1007%2F978-1-4471-3215-8_17)
(You can find a .ps online, thanks to @bgamari. Alternatively grep for
'Spectral' in docs/paper/paper.verb).

- `imaginary`: Mostly toy benchmarks, solving puzzles like n-queens.
- `spectral`: Algorithmic kernels, like FFT. If you want to add a benchmark of a
  library, this most certainly the place to put it.
- `real`: Actual applications, with a command-line interface and all. Because of
  the large dependency footprint of today's applications, these have become
  rather aged.
- `shootout`: Benchmarks from
  [the benchmarks game](https://benchmarksgame-team.pages.debian.net/benchmarksgame/),
  formerly known as "language shootout".

Most of the benchmarks are quite old and aren't really written in way one would
write high-performance Haskell code today (e.g., use of `String`, lists,
redefining own list combinators that don't take part in list fusion, rare use of
strictness annotations or unboxed data), so new benchmarks for the `real` and
`spectral` in brackets in particular are always welcome!

#### Other categories

Other than the default single-threaded categories above, there are the
following (SG: I'm guessing here, have never run them):

- `gc`: Run by `make -C gc` (though you'll probably have to edit the Makefile to
  your specific config). Select benchmarks from `spectral` and `real`, plus a
  few more (Careful, these have not been touched by #15999/!5, see the next
  subsection). Testdrives different GC configs, apparently.
- `smp`: Microbenchmarks for the `-threaded` runtime, measuring scheduler
  performance on concurrent and STM-heavy code.

Sebastian Graf's avatar
Sebastian Graf committed
210 211
### Stability wrt. GC paramerisations

Oleg Grenrus's avatar
Oleg Grenrus committed
212
Additionally, pay attention that your benchmarks are stable wrt. different
Sebastian Graf's avatar
Sebastian Graf committed
213
GC paramerisations, so that small changes in allocation don't lead to big,
214
unexplicable jumps in performance. See #15999 for details. Also make sure
Sebastian Graf's avatar
Sebastian Graf committed
215 216 217 218 219 220
that you run the benchmark with the default GC settings, as enlarging Gen 0 or
Gen 1 heaps just amplifies the problem.

As a rule of thumb on how to ensure this: Make sure that your benchmark doesn't
just build up one big data and consume it in a final step, but rather that the
working set grows and shrinks (e.g. is approximately constant) over the whole
Ben Gamari's avatar
Ben Gamari committed
221 222
run of the benchmark. You can ensure this by iterating your main logic `$n` 
times (how often depends on your program, but in the ball park of 100-1000).
Sebastian Graf's avatar
Sebastian Graf committed
223
You can test stability by plotting productivity curves for your `fast` settings
224
with the `prod.py` script attached to #15999.
Sebastian Graf's avatar
Sebastian Graf committed
225 226

If in doubt, ask Sebastian Graf for help.
Oleg Grenrus's avatar
Oleg Grenrus committed
227

Ben Gamari's avatar
Ben Gamari committed
228 229 230 231 232 233 234 235 236 237
## Important notes

Note that some of these tests (e.g. `spectral/fish`) tend to be very sensitive
to branch predictor effectiveness. This means that changes in the compiler
can easily be masked by "random" fluctuations in the code layout produced by
particular compiler runs. Recent GHC versions provide the `-fproc-alignment`
flag to pad procedures, ensuring slightly better stability across runs. If you
are seeing an unexpected change in performance try adding `-fproc-alignment=64`
the compiler flags of both your baseline and test tree.

Oleg Grenrus's avatar
Oleg Grenrus committed
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253
## easy.sh

```
./easy.sh - easy nofib

Usage: ./easy.sh [ -m mode ] /path/to/baseline/ghc /path/to/new/ghc"

GHC paths can point to the root of the GHC repository,
if it's build with Hadrian.

Available options:
  -m MODE  nofib mode: fast norm slow

This script caches the results using the sha256 of ghc executable.
Remove these files, if you want to rerun the benchmark.
```