# NoFib: Haskell Benchmark Suite

This is the root directory of the "NoFib Haskell benchmark suite".
There are currently two means of running the `nofib` benchmarks:

 * [the `shake`-based build system](shake/README.mkd)
 * [the legacy `make`-based build system](README.make.mkd)

Users are generally encouraged to use the former when possible. See the linked
READMEs for usage instructions.

Additional information can also be found on
[NoFib's wiki page](https://gitlab.haskell.org/ghc/ghc/-/wikis/building/running-nofib).


## Adding benchmarks

If you add a benchmark try to set the problem sizes for
fast/normal/slow reasonably. [Modes](#modes) lists the recommended brackets for
each mode.

### Benchmark runtimes

Benchmark should ideally support running in three different modes:

- `fast`: 0.1-0.2s
- `norm`: 1-2s
- `slow`: 5-10s

You can look at existing benchmarks for how this is usually achieved.
### Benchmark Categories

So you have a benchmark to submit but don't know in which subfolder to put it? Here's some
advice on the intended semantics of each category.

#### Single threaded benchmarks

These are run when you just type `make`. Their semantics is explained in
[the Nofib paper](https://link.springer.com/chapter/10.1007%2F978-1-4471-3215-8_17)
(You can find a .ps online, thanks to @bgamari. Alternatively grep for
'Spectral' in docs/paper/paper.verb).

- `imaginary`: Mostly toy benchmarks, solving puzzles like n-queens.
- `spectral`: Algorithmic kernels, like FFT. If you want to add a benchmark of a
  library, this most certainly the place to put it.
- `real`: Actual applications, with a command-line interface and all. Because of
  the large dependency footprint of today's applications, these have become
  rather aged.
- `shootout`: Benchmarks from
  [the benchmarks game](https://benchmarksgame-team.pages.debian.net/benchmarksgame/),
  formerly known as "language shootout".

Most of the benchmarks are quite old and aren't really written in way one would
write high-performance Haskell code today (e.g., use of `String`, lists,
redefining own list combinators that don't take part in list fusion, rare use of
strictness annotations or unboxed data), so new benchmarks for the `real` and
`spectral` in brackets in particular are always welcome!

#### Other categories

Other than the default single-threaded categories above, there are the
following (SG: I'm guessing here, have never run them):

- `gc`: Run by `make -C gc` (though you'll probably have to edit the Makefile to
  your specific config). Select benchmarks from `spectral` and `real`, plus a
  few more (Careful, these have not been touched by #15999/!5, see the next
  subsection). Testdrives different GC configs, apparently.
- `smp`: Microbenchmarks for the `-threaded` runtime, measuring scheduler
  performance on concurrent and STM-heavy code.

### Stability wrt. GC paramerisations

Additionally, pay attention that your benchmarks are stable wrt. different
GC paramerisations, so that small changes in allocation don't lead to big,
unexplicable jumps in performance. See #15999 for details. Also make sure
that you run the benchmark with the default GC settings, as enlarging Gen 0 or
Gen 1 heaps just amplifies the problem.

As a rule of thumb on how to ensure this: Make sure that your benchmark doesn't
just build up one big data and consume it in a final step, but rather that the
working set grows and shrinks (e.g. is approximately constant) over the whole
run of the benchmark. You can ensure this by iterating your main logic `$n`
times (how often depends on your program, but in the ball park of 100-1000).
You can test stability by plotting productivity curves for your `fast` settings
with the `prod.py` script attached to #15999.

If in doubt, ask Sebastian Graf for help.

## Important notes

Note that some of these tests (e.g. `spectral/fish`) tend to be very sensitive
to branch predictor effectiveness. This means that changes in the compiler
can easily be masked by "random" fluctuations in the code layout produced by
particular compiler runs. Recent GHC versions provide the `-fproc-alignment`
flag to pad procedures, ensuring slightly better stability across runs. If you
are seeing an unexpected change in performance try adding `-fproc-alignment=64`
the compiler flags of both your baseline and test tree.