Commit eef6dc43 authored by Alp Mestanogullari's avatar Alp Mestanogullari 🦑

add Eventful GHC post

parent 77fbe1fa
Pipeline #10513 passed with stages
in 18 minutes and 37 seconds
---
title: "Eventful GHC"
author: Alp Mestanogullari
date: "2019-09-24"
---
What can we do when it takes GHC a lot of time to compile a given module?
Where is it spending its time? Where can we start to get a 10,000 feet view of
what GHC is doing? This blog post covers one possible answer, using the
[eventlog](https://downloads.haskell.org/~ghc/8.6.5/docs/html/users_guide/runtime_control.html#rts-eventlog)
mechanism.
(This post has also been published on the
[Well-Typed blog](https://well-typed.com/blog/2019/09/eventful-ghc/).)
# Eventlog ?
GHC has a mechanism that allows us to record all sorts of "events" to an
eventlog (a simple binary file), during a program's execution, attaching a
timestamp to it to later allow tools to reconstruct as much of a program's
execution as the events allow. This includes RTS events (garbage collection,
HEC/thread activity) but also user defined events, where "user" designates the
author of a Haskell library or program;
[`Debug.Trace`](https://hackage.haskell.org/package/base-4.12.0.0/docs/Debug-Trace.html)
in `base` provides functions that anyone can use to emit events, in addition to
the ones that the RTS itself will emit.
Those functions are implemented in terms of primitive operations that are
backed by GHC's runtime system. Its design and implementation is
discussed, along with other topics, in
[Parallel Performance Tuning for Haskell](https://www.microsoft.com/en-us/research/publication/parallel-performance-tuning-for-haskell/)
by Don Jones Jr., Simon Marlow and Satnam Singh.
After the program's execution, one can then use libraries like
[ghc-events](http://hackage.haskell.org/package/ghc-events) or
tools like [ghc-events-analyze](http://hackage.haskell.org/package/ghc-events-analyze),
or [threadscope](http://hackage.haskell.org/package/threadscope) to consume the
eventlog in order to gain some insights into where time was spent.
While profiling lets us gather more detailed information about where time is
spent and where allocations are made, it requires rebuilding our program's code
_and_ using a dedicated RTS flavour. The code generated by GHC for a profiled
program is quite different, adding lots of instrumentation to support profiling
the execution of Haskell code. The extra code generated to support profiling
can get in the way of some optimisations and can therefore drastically affect
the performance of a program. On the other hand, generating the eventlog for a
program only requires re-linking it (against a flavour of the RTS that supports
tracing) with `-eventlog` and running it with `+RTS -l`. The eventlog mechanism
also has much lower impact on runtime performance, since emitting events is
merely about putting a few values in a buffer, that the RTS will then regularly
flush to the eventlog file or whatever the destination is. The aforementionned
paper has some precise numbers on the overhead of the eventlog mechanism, but
it's of course quite low.
This can therefore be an interesting solution when you want to get a big picture
without having to rebuild your whole program and some of its dependencies. All
you have to do is set up some events that cover the fragments of the program's
execution that you're particularly interested in.
Events are in general emitted using one of the following 4 functions from
`Debug.Trace`:
- `traceEvent :: String -> a -> a`
- `traceEventIO :: String -> IO ()`
- `traceMarker :: String -> a -> a`
- `traceMarkerIO :: String -> IO ()`
The `traceEvent[IO]` functions should be used for all sorts of events,
particularly the ones that are likely to happen a lot during the execution of
your program, while `traceMarker[IO]` are generally used to mark certain points
or phases in the execution and see that visually in the profile. This is
particularly helpful with tools like _eventlog2html_ (see the last section of
this post) that allow you to visualize heap profiles, drawing those
"marker events" on top so that users can get a sense of when some
particularly allocations or deallocations take place, with user-supplied labels
instead of trying to guess from timestamps.
For more about eventlogs in general, see
[GHC's users guide](http://downloads.haskell.org/~ghc/ghc-8.6.5/docs/html/users_guide/runtime_control.html#tracing)
or [the event-log page](https://gitlab.haskell.org/ghc/ghc/wikis/event-log) on the GHC
wiki.
# GHC events
Starting with [this commit](https://gitlab.haskell.org/ghc/ghc/commit/ebfa35284741fca47719f531f0996261441f75b0),
by Ben Gamari about 3 months ago, GHC started emitting eventlog entries for all
calls to `withTiming`, which is a function we use to measure how long various
parts of GHC take. We then started adding a few more `withTiming` calls:
- [688a1b89](https://gitlab.haskell.org/ghc/ghc/commit/688a1b89584327d6ba0d3ec9558a3cd8a111c655)
added tracing around calls to various external tools (C compiler, linker,
assembler, etc);
- [0c5cd771](https://gitlab.haskell.org/ghc/ghc/commit/0c5cd771a8792ca4a4a553d3d4636e32191ef936)
added tracing around all the individual passes that make up the codegen
pipeline (we previously only tracked codegen as a whole, single event);
- [e3cbe319](https://gitlab.haskell.org/ghc/ghc/commit/e3cbe319dbe2f6a64ea3108dcd8d42e7876e5b34)
added tracing around the package database initialization.
And other similar patches later on. As a result, we can trace most of the
execution of GHC when compiling a trivial hello world module:
``` haskell
-- hello.hs
main :: IO ()
main = putStrLn "hello, world!"
```
First, you need to get a somewhat recent checkout of GHC's `master` branch -
[e3cbe319](https://gitlab.haskell.org/ghc/ghc/commit/e3cbe319dbe2f6a64ea3108dcd8d42e7876e5b34)
or newer. Then you need to build that source tree and make sure that the stage 2
GHC is linked with `-eventlog`:
``` sh
$ ./boot && ./configure
# Build with Hadrian:
$ hadrian/build.sh -j "stage1.ghc-bin.ghc.link.opts += -eventlog"
# stage 2 ghc at: _build/stage1/bin/ghc
# Build with Make:
$ make -j GhcStage2HcOpts+=-eventlog
# stage 2 ghc at: inplace/bin/ghc-stage2
# If you have a GHC build around already, both of those commands should not
# cause a lot of work to be done: just linking the GHC executable against a
# slightly different RTS flavour! No recompilation needed.
```
You can then build any module, library or program with the resulting stage 2
executable as you would normally do. Except that if you pass `+RTS -l` with one
of `-v2` or `-ddump-timings`, GHC will produce an eventlog at `ghc.eventlog`
with all the standard RTS events, but also events for each pass in GHC's
pipeline. Let's see this in action by compiling `hello.hs' from earlier.
``` sh
# use inplace/bin/ghc-stage2 if you built GHC with Make
$ _build/stage1/bin/ghc -ddump-timings hello.hs -o hello +RTS -l
```
We now have an eventlog. But what can we do with it? You've got several options:
- The [ghc-events](http://hackage.haskell.org/package/ghc-events) library and
program, used by all the other options, provide primitives for decoding the
eventlog and extracting data from it. The accompanying program provides
various commands for querying eventlog files.
- The [threadscope](http://hackage.haskell.org/package/threadscope) program lets
you visualize eventlogs with a GTK+ frontend.
- [ghc-events-analyze](http://hackage.haskell.org/package/ghc-events-analyze)
produces per-label totals, SVG visualizations and per-label timing data given
an eventlog file as input.
- ... and maybe other options that I'm not remembering or aware of at all.
I personally use `ghc-events-analyze` a lot, since it can produce pretty SVG
pictures that I can then embed in GitLab comments or... blog posts. :-)
To get the timings and SVG picture, you need to install `ghc-events-analyze`,
and then ask it to process `ghc.eventlog` with the right "markers":
``` sh
$ ghc-events-analyze --timed --timed-txt --totals \
--start "GHC:started:" --stop "GHC:finished:" \
ghc.eventlog
# --timed: produce SVG visualization of the eventlog, in ghc.timed.svg
# --timed-txt: produce per-label groupings of timings that report when the events
were emitted, in ghc.timed.txt
# --totals: produce per-label totals, reporting how much time was spent in
a given label, in total
# --start: all events that we're interested in are wrapped in
# --stop GHC:started:... / GHC:finished:... events, so we just tell
ghc-events-analyze that it should be looking for those markers
and report about those events.
```
Here are the totals that I get for our hello world program compilation.
```
GC 389461503ns 0.389s
USER EVENTS (user events are corrected for GC)
systool:linker 2386891920ns 2.387s
systool:cc 801347228ns 0.801s
systool:as 145128851ns 0.145s
Renamer/typechecker [Main] 45709853ns 0.046s
initializing package database 20877412ns 0.021s
CodeGen [Main] 20754058ns 0.021s
CoreTidy [Main] 8262122ns 0.008s
NCG 7566252ns 0.008s
Chasing dependencies 2441212ns 0.002s
Cmm pipeline 2040174ns 0.002s
Desugar [Main] 1657607ns 0.002s
STG -> Cmm 1103737ns 0.001s
Simplifier [Main] 1045768ns 0.001s
Cmm -> Raw Cmm 319442ns 0.000s
Parser [Main] 286350ns 0.000s
CorePrep [Main] 88795ns 0.000s
TOTAL 3445520781ns 3.446s
THREAD EVENTS
1 460138ns 0.000s
IOManager on cap 0:2 670044ns 0.001s
TimerManager:3 143699ns 0.000s
4 153050783ns 0.153s
weak finalizer thread:5 43518ns 0.000s
weak finalizer thread:6 10677ns 0.000s
weak finalizer thread:7 33126ns 0.000s
weak finalizer thread:8 23787ns 0.000s
weak finalizer thread:9 1534ns 0.000s
weak finalizer thread:10 8142ns 0.000s
weak finalizer thread:11 1352ns 0.000s
weak finalizer thread:12 10080ns 0.000s
weak finalizer thread:13 10603ns 0.000s
weak finalizer thread:14 8767ns 0.000s
weak finalizer thread:15 10849ns 0.000s
16 69745ns 0.000s
17 307420ns 0.000s
18 83017ns 0.000s
19 76866ns 0.000s
weak finalizer thread:20 10083ns 0.000s
21 96582ns 0.000s
22 103373ns 0.000s
23 62562ns 0.000s
24 97655ns 0.000s
weak finalizer thread:25 11676ns 0.000s
26 238116ns 0.000s
27 245821ns 0.000s
weak finalizer thread:28 8268ns 0.000s
29 80235ns 0.000s
30 86759ns 0.000s
31 21571ns 0.000s
weak finalizer thread:32 396ns 0.000s
TOTAL 156087244ns 0.156s
```
And the SVG image:
![](images/eventful-ghc/ghc-events-viz.svg){ width=800px }
([click here for full size picture](images/eventful-ghc/ghc-events-viz.svg))
We can see two time windows where we don't have an "active" label:
- towards the beginning, where GHC likely parses settings, sets everything up
and reads the input file;
- after code generation and before calling the assembler, where it turns out we
spend most of the time getting some information about the C compiler being
used.
We otherwise have a pretty good picture of what GHC is doing the rest of the
time. It might be nice in the future to add a few more events, e.g to track when
we're running Template Haskell code.
If you want GHC to only emit so-called "user events" (= non-RTS events) to the
eventlog, pass `+RTS -l-au` instead of `+RTS -l`. More options can be found
in [the relevant section of the GHC user manual](https://downloads.haskell.org/~ghc/ghc-8.6.5/docs/html/users_guide/runtime_control.html#rts-eventlog).
# Wrapping up
Of course, if you're looking into a performance problem, once you narrow down
the part of the execution that is taking too long, you will quite likely start
using slightly fancier profiling methods, nothing new there. You can however
emit your own events when working on GHC, so as to collect some simple timing
data about those codepaths that you are working on, without affecting GHC's
performance much. You simply have to use `compiler/main/ErrUtils.hs:withTiming`:
``` haskell
withTiming <action to get dynflags> (text "<label>") <forcing function> $
<action to time>
```
where the forcing function is used to force the evaluation of the action's
result just as much as the user wants; the time it takes to perform this
evaluation is included in the timings reported by `withTiming`. Your `<label>`
would then show up in the eventlog reports, if the corresponding code path is
entered. See [Note [withTiming]](https://gitlab.haskell.org/ghc/ghc/blob/68ddb43c44065d0d3a8a6893f7f8e87f15ee9c1e/compiler/main/ErrUtils.hs#L802)
for more explanations about the `withTiming` function and how to use it.
All the eventlog markers that `withTiming` emits can then be integrated in heap
profiles using [eventlog2html](https://mpickering.github.io/eventlog2html/),
written by Matthew Pickering; see
[here](https://mpickering.github.io/eventlog2html/examples/ghc.eventlog.html)
for an example. If you put your mouse on one of the vertical lines, you will see
the label of the phase that begins or ends at that moment.
Before ending this post, I would like to mention that we have some
infrastructure to collect timings and allocations for various compilation
phases in GHC, over the
[head.hackage](https://gitlab.haskell.org/ghc/head.hackage) package set.
This can be used locally or triggered manually from the CI interface of
the GHC repository, and is set up to run nightly against GHC's `master`
branch. The resulting data can be loaded in PostgreSQL and queried at will.
We plan on collecting more data about GHC's behaviour when compiling all
those packages and on relying more on the eventlog than GHC output parsing to do
so, with two major goals in mind:
- have an easy way to track GHC's performance over time and in particular
provide some initial data about where a performance regression might be
coming from;
- expose enough data to be able to identify "real world modules" where GHC
doesn't perform as well as we would like it to, which could in turn lead to
an investigation and possibly a patch that improves GHC's performance on the
said module.
Happy profiling!
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment