20190924-eventful-ghc.mkd 14.8 KB
Newer Older
Alp Mestanogullari's avatar
Alp Mestanogullari committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303
title: "Eventful GHC"
author: Alp Mestanogullari
date: "2019-09-24"

What can we do when it takes GHC a lot of time to compile a given module?
Where is it spending its time? Where can we start to get a 10,000 feet view of
what GHC is doing? This blog post covers one possible answer, using the

(This post has also been published on the
[Well-Typed blog](https://well-typed.com/blog/2019/09/eventful-ghc/).)

# Eventlog ?

GHC has a mechanism that allows us to record all sorts of "events" to an
eventlog (a simple binary file), during a program's execution, attaching a
timestamp to it to later allow tools to reconstruct as much of a program's
execution as the events allow. This includes RTS events (garbage collection,
HEC/thread activity) but also user defined events, where "user" designates the
author of a Haskell library or program;
in `base` provides functions that anyone can use to emit events, in addition to
the ones that the RTS itself will emit.

Those functions are implemented in terms of primitive operations that are
backed by GHC's runtime system. Its design and implementation is
discussed, along with other topics, in
[Parallel Performance Tuning for Haskell](https://www.microsoft.com/en-us/research/publication/parallel-performance-tuning-for-haskell/)
by Don Jones Jr., Simon Marlow and Satnam Singh.

After the program's execution, one can then use libraries like
[ghc-events](http://hackage.haskell.org/package/ghc-events) or
tools like [ghc-events-analyze](http://hackage.haskell.org/package/ghc-events-analyze),
or [threadscope](http://hackage.haskell.org/package/threadscope) to consume the
eventlog in order to gain some insights into where time was spent.

While profiling lets us gather more detailed information about where time is
spent and where allocations are made, it requires rebuilding our program's code
_and_ using a dedicated RTS flavour. The code generated by GHC for a profiled
program is quite different, adding lots of instrumentation to support profiling
the execution of Haskell code. The extra code generated to support profiling
can get in the way of some optimisations and can therefore drastically affect
the performance of a program. On the other hand, generating the eventlog for a
program only requires re-linking it (against a flavour of the RTS that supports
tracing) with `-eventlog` and running it with `+RTS -l`. The eventlog mechanism
also has much lower impact on runtime performance, since emitting events is
merely about putting a few values in a buffer, that the RTS will then regularly
flush to the eventlog file or whatever the destination is. The aforementionned
paper has some precise numbers on the overhead of the eventlog mechanism, but
it's of course quite low.

This can therefore be an interesting solution when you want to get a big picture
without having to rebuild your whole program and some of its dependencies. All
you have to do is set up some events that cover the fragments of the program's
execution that you're particularly interested in.

Events are in general emitted using one of the following 4 functions from
- `traceEvent :: String -> a -> a`
- `traceEventIO :: String -> IO ()`
- `traceMarker :: String -> a -> a`
- `traceMarkerIO :: String -> IO ()`

The `traceEvent[IO]` functions should be used for all sorts of events,
particularly the ones that are likely to happen a lot during the execution of
your program, while `traceMarker[IO]` are generally used to mark certain points
or phases in the execution and see that visually in the profile. This is
particularly helpful with tools like _eventlog2html_ (see the last section of
this post) that allow you to visualize heap profiles, drawing those
"marker events" on top so that users can get a sense of when some
particularly allocations or deallocations take place, with user-supplied labels
instead of trying to guess from timestamps.

For more about eventlogs in general, see
[GHC's users guide](http://downloads.haskell.org/~ghc/ghc-8.6.5/docs/html/users_guide/runtime_control.html#tracing)
or [the event-log page](https://gitlab.haskell.org/ghc/ghc/wikis/event-log) on the GHC

# GHC events

Starting with [this commit](https://gitlab.haskell.org/ghc/ghc/commit/ebfa35284741fca47719f531f0996261441f75b0),
by Ben Gamari about 3 months ago, GHC started emitting eventlog entries for all
calls to `withTiming`, which is a function we use to measure how long various
parts of GHC take. We then started adding a few more `withTiming` calls:

- [688a1b89](https://gitlab.haskell.org/ghc/ghc/commit/688a1b89584327d6ba0d3ec9558a3cd8a111c655)
  added tracing around calls to various external tools (C compiler, linker,
  assembler, etc);
- [0c5cd771](https://gitlab.haskell.org/ghc/ghc/commit/0c5cd771a8792ca4a4a553d3d4636e32191ef936)
  added tracing around all the individual passes that make up the codegen
  pipeline (we previously only tracked codegen as a whole, single event);
- [e3cbe319](https://gitlab.haskell.org/ghc/ghc/commit/e3cbe319dbe2f6a64ea3108dcd8d42e7876e5b34)
  added tracing around the package database initialization.

And other similar patches later on. As a result, we can trace most of the
execution of GHC when compiling a trivial hello world module:

``` haskell
-- hello.hs
main :: IO ()
main = putStrLn "hello, world!"

First, you need to get a somewhat recent checkout of GHC's `master` branch -
or newer. Then you need to build that source tree and make sure that the stage 2
GHC is linked with `-eventlog`:

``` sh
$ ./boot && ./configure

# Build with Hadrian:
$ hadrian/build.sh -j "stage1.ghc-bin.ghc.link.opts += -eventlog"
# stage 2 ghc at: _build/stage1/bin/ghc

# Build with Make:
$ make -j GhcStage2HcOpts+=-eventlog
# stage 2 ghc at: inplace/bin/ghc-stage2

# If you have a GHC build around already, both of those commands should not
# cause a lot of work to be done: just linking the GHC executable against a
# slightly different RTS flavour! No recompilation needed.

You can then build any module, library or program with the resulting stage 2
executable as you would normally do. Except that if you pass `+RTS -l` with one
of `-v2` or `-ddump-timings`, GHC will produce an eventlog at `ghc.eventlog`
with all the standard RTS events, but also events for each pass in GHC's
pipeline. Let's see this in action by compiling `hello.hs' from earlier.

``` sh
# use inplace/bin/ghc-stage2 if you built GHC with Make
$ _build/stage1/bin/ghc -ddump-timings hello.hs -o hello +RTS -l

We now have an eventlog. But what can we do with it? You've got several options:

- The [ghc-events](http://hackage.haskell.org/package/ghc-events) library and
  program, used by all the other options, provide primitives for decoding the
  eventlog and extracting data from it. The accompanying program provides
  various commands for querying eventlog files.
- The [threadscope](http://hackage.haskell.org/package/threadscope) program lets
  you visualize eventlogs with a GTK+ frontend.
- [ghc-events-analyze](http://hackage.haskell.org/package/ghc-events-analyze)
  produces per-label totals, SVG visualizations and per-label timing data given
  an eventlog file as input.
- ... and maybe other options that I'm not remembering or aware of at all.

I personally use `ghc-events-analyze` a lot, since it can produce pretty SVG
pictures that I can then embed in GitLab comments or... blog posts. :-)

To get the timings and SVG picture, you need to install `ghc-events-analyze`,
and then ask it to process `ghc.eventlog` with the right "markers":

``` sh
$ ghc-events-analyze --timed --timed-txt --totals \
                     --start "GHC:started:" --stop "GHC:finished:" \

# --timed:     produce SVG visualization of the eventlog, in ghc.timed.svg
# --timed-txt: produce per-label groupings of timings that report when the events
               were emitted, in ghc.timed.txt
# --totals:    produce per-label totals, reporting how much time was spent in
               a given label, in total
# --start:     all events that we're interested in are wrapped in
# --stop       GHC:started:... / GHC:finished:... events, so we just tell
               ghc-events-analyze that it should be looking for those markers
	       and report about those events.

Here are the totals that I get for our hello world program compilation.

GC                                389461503ns   0.389s

USER EVENTS (user events are corrected for GC)
 systool:linker                  2386891920ns   2.387s
 systool:cc                       801347228ns   0.801s
 systool:as                       145128851ns   0.145s
 Renamer/typechecker [Main]        45709853ns   0.046s
 initializing package database     20877412ns   0.021s
 CodeGen [Main]                    20754058ns   0.021s
 CoreTidy [Main]                    8262122ns   0.008s
 NCG                                7566252ns   0.008s
 Chasing dependencies               2441212ns   0.002s
 Cmm pipeline                       2040174ns   0.002s
 Desugar [Main]                     1657607ns   0.002s
 STG -> Cmm                         1103737ns   0.001s
 Simplifier [Main]                  1045768ns   0.001s
 Cmm -> Raw Cmm                      319442ns   0.000s
 Parser [Main]                       286350ns   0.000s
 CorePrep [Main]                      88795ns   0.000s
TOTAL                            3445520781ns   3.446s

1                                    460138ns   0.000s
IOManager on cap 0:2                 670044ns   0.001s
TimerManager:3                       143699ns   0.000s
4                                 153050783ns   0.153s
weak finalizer thread:5               43518ns   0.000s
weak finalizer thread:6               10677ns   0.000s
weak finalizer thread:7               33126ns   0.000s
weak finalizer thread:8               23787ns   0.000s
weak finalizer thread:9                1534ns   0.000s
weak finalizer thread:10               8142ns   0.000s
weak finalizer thread:11               1352ns   0.000s
weak finalizer thread:12              10080ns   0.000s
weak finalizer thread:13              10603ns   0.000s
weak finalizer thread:14               8767ns   0.000s
weak finalizer thread:15              10849ns   0.000s
16                                    69745ns   0.000s
17                                   307420ns   0.000s
18                                    83017ns   0.000s
19                                    76866ns   0.000s
weak finalizer thread:20              10083ns   0.000s
21                                    96582ns   0.000s
22                                   103373ns   0.000s
23                                    62562ns   0.000s
24                                    97655ns   0.000s
weak finalizer thread:25              11676ns   0.000s
26                                   238116ns   0.000s
27                                   245821ns   0.000s
weak finalizer thread:28               8268ns   0.000s
29                                    80235ns   0.000s
30                                    86759ns   0.000s
31                                    21571ns   0.000s
weak finalizer thread:32                396ns   0.000s
TOTAL                             156087244ns   0.156s

And the SVG image:

![](images/eventful-ghc/ghc-events-viz.svg){ width=800px }

([click here for full size picture](images/eventful-ghc/ghc-events-viz.svg))

We can see two time windows where we don't have an "active" label:

- towards the beginning, where GHC likely parses settings, sets everything up
  and reads the input file;
- after code generation and before calling the assembler, where it turns out we
  spend most of the time getting some information about the C compiler being

We otherwise have a pretty good picture of what GHC is doing the rest of the
time. It might be nice in the future to add a few more events, e.g to track when
we're running Template Haskell code.

If you want GHC to only emit so-called "user events" (= non-RTS events) to the
eventlog, pass `+RTS -l-au` instead of `+RTS -l`. More options can be found
in [the relevant section of the GHC user manual](https://downloads.haskell.org/~ghc/ghc-8.6.5/docs/html/users_guide/runtime_control.html#rts-eventlog).

# Wrapping up

Of course, if you're looking into a performance problem, once you narrow down
the part of the execution that is taking too long, you will quite likely start
using slightly fancier profiling methods, nothing new there. You can however
emit your own events when working on GHC, so as to collect some simple timing
data about those codepaths that you are working on, without affecting GHC's
performance much. You simply have to use `compiler/main/ErrUtils.hs:withTiming`:

``` haskell
withTiming <action to get dynflags> (text "<label>") <forcing function> $
  <action to time>

where the forcing function is used to force the evaluation of the action's
result just as much as the user wants; the time it takes to perform this
evaluation is included in the timings reported by `withTiming`. Your `<label>`
would then show up in the eventlog reports, if the corresponding code path is
entered. See [Note [withTiming]](https://gitlab.haskell.org/ghc/ghc/blob/68ddb43c44065d0d3a8a6893f7f8e87f15ee9c1e/compiler/main/ErrUtils.hs#L802)
for more explanations about the `withTiming` function and how to use it.

All the eventlog markers that `withTiming` emits can then be integrated in heap
profiles using [eventlog2html](https://mpickering.github.io/eventlog2html/),
written by Matthew Pickering; see
for an example. If you put your mouse on one of the vertical lines, you will see
the label of the phase that begins or ends at that moment.

Before ending this post, I would like to mention that we have some
infrastructure to collect timings and allocations for various compilation
phases in GHC, over the
[head.hackage](https://gitlab.haskell.org/ghc/head.hackage) package set.
This can be used locally or triggered manually from the CI interface of
the GHC repository, and is set up to run nightly against GHC's `master`
branch. The resulting data can be loaded in PostgreSQL and queried at will.
We plan on collecting more data about GHC's behaviour when compiling all
those packages and on relying more on the eventlog than GHC output parsing to do
so, with two major goals in mind:

- have an easy way to track GHC's performance over time and in particular
  provide some initial data about where a performance regression might be
  coming from;
- expose enough data to be able to identify "real world modules" where GHC
  doesn't perform as well as we would like it to, which could in turn lead to
  an investigation and possibly a patch that improves GHC's performance on the
  said module.

Happy profiling!