Properly link Haskell shared libs on all systems

Since the early days of building Haskell shared libs on Linux we have been using a scheme that is really a bit of a hack. We should do it properly.

In my blog post on this from 2009 (http://www.well-typed.com/blog/30/) I said:

If we use ldd again to look at the libfoo.so that we've made we will notice that it is missing a dependency on the rts library. This is problem that we've yet to sort out, so for the moment we can just add the dependency ourselves:

$ ghc --make -dynamic -shared -fPIC Foo.hs -o libfoo.so \ -lHSrts-ghc6.11 -optl-Wl,-rpath,/opt/ghc/lib/ghc-6.11/

The reason it's not linked in yet is because we need to be able to switch which version of the rts we're using without having to relink every library. For example we want to be able to switch between the debug, threaded and normal rts versions. It's quite possible to do this and it just needs a bit more rearranging in the build system to sort it out. Once it's done you'll even be able to switch rts at runtime, eg:

$ LD_PRELOAD=/opt/ghc/lib/ghc-6.11/libHSrts_debug-ghc6.11.so $ ./Hello

So in general, if a shared lib requires symbols from another shared lib then it should depend on it. In ELF terminology that means a NEEDED entry to say this lib needs that other lib. This is important to be able to link and load these shared libraries, otherwise they can have dangling dependencies.

But we don't do this. For the specific case of the RTS we do not link Haskell shared libs against the RTS. So they have lots of dangling symbols. These libraries cannot be loaded on their own, e.g. with dlopen(). This is bad, and has other knock-on consequences.

Why don't we link to the RTS? It's because historically (with static linking) GHC had had the ability to select the flavour of the RTS when final executables are linked, not when intermediate libraries are created. This works because the RTS flavours share a common ABI. This is a useful feature as it lets us select the SMP or debug or other RTS at final link time. So when we made up the first shared lib scheme on ELF we had to support this.

Our initial scheme was like this: don't link Haskell library DSOs against the RTS, only like the final exe against the RTS. Each RTS flavour has a separate SONAME, e.g. libHSrts_thr-ghc7.8.4.so or libHSrts_debug-ghc7.8.4.so. This works because the runtime linker looks at the final exe first and loads the RTS, and then when other libs are loaded the symbols all resolve.

Why can't we link all the libraries against the RTS? Currently each RTS flavour has a different SONAME, which is the key that the dynamic linker uses to identify each library. So if we did link all the Haskell libs against "the" RTS we would have to pick which one at the point at which we create the library, and that'd stop us from being able to choose later.

So, can we use a better scheme? We want one that doesn't leave dangling undefined references in intermediate Haskell libs, and is also compatible with the ability to select the flavour of the RTS at final exe link time (or even override it at load time).

Yes we can!

The first thing to note is that to be interchangeable, all the RTS flavours (that share a compatible ABI) need to have the same SONAME. So for example, all the (non-profiling) RTS DSO files have to have the internal SONAME of libHSrts-ghc7.8.4.so. Once they all have the same SONAME, then it's ok for all the Haskell libs to specify a NEEDED dependency on that rts SONAME.

But if they have the same SONAME, what do the files get called, where do they live and how are they found? The trick is to make use of the search path. Put each RTS flavour in a different directory, but otherwise with the same filename, e.g. lib/rts-1.0/thr/libHSrts-ghc7.8.4.so, lib/rts-1.0/debug/libHSrts-ghc7.8.4.so etc.

Each library DSO and exe has its list of NEEDED entries, and it has an RPATH entry used to find those libraries if they're not loaded yet. The key is the "if they're not loaded yet" bit. Remember that the linker uses the SONAME as the key to decide if the lib is loaded yet or not. So the libraries could all have an RPATH entry to say to look for the RTS in the directory containing the default RTS flavour. But then the top level exe (or foreign/export shared lib) can also link to the RTS directly (ie an NEEDED entry) and can specify an RPATH which can be for any of the rts flavours. When the linker loads the top level exe, it will loads the selected RTS using the exe's RPATH, and then when the linker sees other Haskell libs that have a NEEDED entry on the RTS it will ignore them because the RTS's SONAME is already loaded.

So concretely, instead of:

lib/ghc-{ver}/rts-1.0/libHSrts-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/libHSrts_thr-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/libHSrts_debug-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/libHSrts_l-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/libHSrts_thr_l-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/libHSrts_thr_debug-ghc{ver}.so

each with a different SONAME

we'd have

lib/ghc-{ver}/rts-1.0/libHSrts-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/thr/libHSrts-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/debug/libHSrts-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/l/libHSrts-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/thr_l/libHSrts-ghc{ver}.so

lib/ghc-{ver}/rts-1.0/thr_debug/libHSrts-ghc{ver}.so

each with the same SONAME

When linking libs we would always use -lHSrts -rpath lib/ghc-${ver}/rts-1.0

When linking exes (or shared libs for external consumption) we would use both -lHSrts and -rpath lib/ghc-${ver}/${rtsflavour}/rts-1.0.

Trac metadata

Trac field	Value
Version	7.11
Type	Task
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Package system
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system	Linux
Architecture

Edited Mar 10, 2019 by Tamar Christina

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information