On-demand linking when evaluating Template Haskell splices in packages with many dependencies is painfully slow
When GHC expands Template Haskell splices, it necessarily must load the code those splices depend on. It has a few different strategies for doing this, depending on what is being loaded and how GHC is built, but the default strategy used since #3658 (closed) (which landed all the way back in GHC 7.8) can sometimes be painfully slow. In particular, on the codebase I’ve recently been looking at, more than a third of the compilation time for some modules is spent running the system linker, individual invocations of which can take multiple seconds to return. (And this is using gold
; plain ld
is even worse.)
Why does this happen? The poor performance arises from the way GHC loads code for other modules in the same package when evaluating a splice. Loading code from other packages is easy: GHC just uses dlopen
to load the appropriate dynamic library (i.e. the .so
file on Linux). But when a splice depends on modules from the current package, this isn’t possible, since all we have are .dyn_o
files. It isn’t possible to dlopen
those directly, so GHC invokes the system linker to create a .so
on demand that contains the set of .dyn_o
files it needs.
Unfortunately, this strategy significantly degrades as the set of external package dependencies grows. When GHC invokes the system linker, it instructs it to link the resulting library against every package the module depends on, e.g. roughly
ld -shared -o /tmp/ghc_01.so \
Foo.dyn_o Bar.dyn_o \
-lHSbase-4.16.0.0 -lHStext-2.0 -lHScontainers-0.6.5.1 ...
and this can become glacially slow if the set of dependencies becomes sufficiently large. The particular codebase I’ve been looking at depends on a little over 400 Haskell packages, which is what leads to such miserable linking times.
This strategy of building a .so
file on demand is appealing in its simplicity, as it allows GHC to dispense with most of its own machinery for linking and loading and just defer to system tools. Unfortunately, on a project with thousands of modules, many of which use Template Haskell, it results in unpalatably slow compilation times. Some modules even perform this process multiple times (since later splices in the same module require further modules to be loaded), which only compounds the problem. Reducing the number of times the linker needs to be invoked would be a good first step, but it’s hard to see how to make this process significantly faster without coming up with a more efficient strategy for loading the necessary code in the first place.
I am by no means an expert on linking, so I am not sure if there is some way GHC continue to use the system linker without triggering such poor performance. However, using the system linker necessarily means that many of the variables are outside of GHC’s control, so it might be worth pursuing a means to avoid using the linker in such scenarios altogether. To that end, it seems like #21067 (closed) could be related, though I haven’t investigated the status of that work in detail.
Regardless, the status quo is painful, so we ought to find something better.