The (new) GHC Build System
(this page will eventually replace Attic/Building/BuildSystem when the new build system is merged in)
This section contains everything you need to know in order to understand and modify the GHC build system. The build system is non-standard in various ways (to be explained shortly), and is decidedly non-trivial: do not attempt to modify it without having a grasp of the concepts that follow!
Each of the following subsections describes one of the
we use in the build system. There are a handful of such idioms, and
when you've understood them all you'll be able to understand most of
the code you'll find in the build system. We'll describe the idioms
first, and then get on to the specifics of how we build GHC.
Historical note: this is the third major revision of the GHC build
system. The first incarnation was based on "jmake", a derivative of
X11's "imake", which is based on using the C preprocessor to add macro
#include to plain make. The second incarnation
used GNU make's extensions for including makefiles (but lost the
ability to use macros, since at the time GNU make didn't have support
for general macros). In this third revision, we use even more of GNU
make's extensions, and we make a fundamental change to the design, as
described in the next section.
Idiom: non-recursive make
Build systems for large projects often use the technique commonly
known as "recursive make", where there is a separate
each directory that is capable of building that part of the system.
Makefiles may share some common infrastructure and configuration
by using GNU make's
include directive; this is exactly what the
previous GHC build system did. However, this design has a number of
flaws, as described in Peter Miller's
Recursive Make Considered Harmful.
The GHC build system adopts the non-recursive make idiom. That is, we
never invoke make from inside a
Makefile, and the whole build system
is effectively a single giant
This gives us the following advantages:
Specifying dependencies between different parts of the tree is easy. In this way, we can accurately specify many dependencies that we could not in the old recursive-make system. This makes it much more likely that when you say "make" after modifying parts of the tree or pulling new patches, the build system will bring everything up-to-date in the correct order, and leave you with a working system.
More parallelism: dependencies are more fine-grained, and there is no need to build separate parts of the system in sequence, so the overall effect is that we have more parallelism in the build.
Doesn't this sacrifice modularity? No - we can still split the build
system into separate files, using GNU make's
Specific notes related to this idiom:
Individual directories usually have a
ghc.mkfile which contains the build instructions for that directory.
Other parts of the build system are in
ghc.mkfile includes all the other
*.mkfiles in the tree. The top-level
Makefileinvokes make on
ghc.mk(this is the only recursive invocation of make; see the "phase ordering" idiom below).
Idiom: stub makefiles
It's all very well having a single giant
Makefile that knows how to
build everything in the right order, but sometimes you want to build
just part of the system. When working on GHC itself, we might want to
build just the compiler, for example. In the recursive make system we
cd ghc and then
make. In the non-recursive system we can
still achieve this by specifying the target with something like `make
ghc/stage1/build/ghc`, but that's not so convenient.
Our second idiom therefore supports the
cd ghc; make idiom, just as
with recursive make. To achieve this we put tiny stub
Makefile in each
directory whose job it is to invoke the main
Makefile specifying the
appropriate target(s) for that directory. These stub
follow a simple pattern:
dir = libraries/base TOP = ../.. include $(TOP)/mk/sub-makefile.mk
mk/sub-makefile.mk knows how to recursively invoke the giant top-level make.
Idiom: standard targets (all, clean, etc.)
We want an
all target that builds everything, but we also want a way to build individual components (say, everything in
rts/). This is achieved by having a separate "all" target for each directory, named
all_directory. For example in
rts/ghc.mk we might have this:
all : all_rts .PHONY all_rts all_rts : ...dependencies...
When the top level make includes all these
ghc.mk files, it will see that target
all depends on
all_rts, all_ghc, ...etc...; so
make all will make all of these. But the individual targets are still available. In particular, you can say
make all_rts(anywhere) to build everything in the RTS directory
make all(anywhere) to build everything
make, with no explicit target, makes the default target in the current directory's stub
Makefile, which in turn makes the target
all_dir, where dir is the current directory.
Other standard targets such as
install, and so on use the same technique. There are pre-canned macros to define your "all" and "clean" targets, take a look in
What do we use to compile GHC? GHC itself, of course. In a complete build we actually build GHC twice: once using the GHC version that is installed, and then again using the GHC we just built. To be clear about which GHC we are talking about, we number them:
- Stage 0 is the GHC you have installed. The "GHC you have installed" is also called "the bootstrap compiler".
- Stage 1 is the first GHC we build, using stage 0. Stage 1 is then used to build the packages.
Stage 2 is the second GHC we build, using stage 1. This is the one we normally install when you say
- Stage 3 is optional, but is sometimes built to test stage 2.
Stage 1 does not support interactive execution (GHCi) and Template Haskell. The reason being that when running byte code we must dynamically link the packages, and only in stage 2 and later can we guarantee that the packages we dynamically link are compatible with those that GHC was built against (because they are the very same packages).
Often we want to build a component multiple times in different ways. For example:
certain libraries (e.g. Cabal) are required by GHC, so we build them once with the bootstrapping compiler, and again with stage 1 once that is built.
GHC itself is built multiple times (stage 1, stage 2, maybe stage 3)
some tools (e.g. ghc-pkg) are also built once with the bootstrapping compiler, and then again using stage 1 later.
In order to support multiple builds in a directory, we place all generated files in a subdirectory, called the "distdir". The distdir can be anything at all; for example in
compiler/ we name our distdirs after the stage (
stage2 etc.). When there is only a single build in a directory, by convention we usually call the distdir simply "dist".
There is a related concept called ways, which includes profiling and dynamic-linking. Multiple ways are currently part of the same "build" and use the same distdir, but in the future we might unify these concepts and give each way its own distdir.
Idiom: interaction with Cabal
Many of the components of the GHC build system are also Cabal
packages, with package metadata defined in a
foo.cabal file. For the
GHC build system we need to extract that metadata and use it to build
the package. This is done by the program
in the GHC source tree). This program reads
foo.cabal and produces
package-data.mk containing the package metadata in the form of
makefile bindings that we can use directly.
We adhere to the following rule:
ghc-cabal generates only
makefile variable bindings, such as
HS_SRCS = Foo.hs Bar.hs
ghc-cabal never generates makefile rules, macro, macro invocations etc.
All the makefile code is therefore contained in fixed, editable
Idiom: variable names
Now that our build system is one giant
Makefile, all our variables
share the same namespace. Where previously we might have had a
variable that contained a list of the Haskell source files called
HS_SRCS, now we have one of these for each directory (and indeed each build, or distdir) in the source tree,
so we have to give them all different names.
The idiom that we use for distinguishing variable names is to prepend
the directory name and the distdir to the variable. So for example the list of
Haskell sources in the directory
utils/hsc2hs would be in the
utils/hsc2hs_dist_HS_SRCS (make doesn't mind slashes in variable
names). The pattern is: directorydistdirvariable.
The build system makes extensive use of Gnu make macros. A macro is defined in
GNU make using
define build-package # args: $1 = directory, $2 = distdir ... makefile code to build a package ... endef
(for example, see
rules/build-package), and is invoked like this:
$(eval $(call build-package,libraries/base,dist))
(this invocation would be in
eval works like this: its argument is expended as normal,
and then the result is interpreted by make as makefile code. This
means the body of the
define gets expanded twice. Typically
this means we need to use
$$ instead of
$ everywhere in the body of
build-package macro may need to define local variables.
There is no support for local variables in macros, but we can define
variables which are guaranteed to not clash with other variables by
preceding their names with a string that is unique to this macro call.
A convenient unique string to use is directorydistdir; this is unique as long as we only call each macro with a given directory/build pair once. Most macros in
the GHC build system take the directory and build as the first two
arguments for exactly this reason. For example, here's an excerpt
define build-prog # $1 = dir # $2 = distdir # $3 = GHC stage to use (0 == bootstrapping compiler) $1_$2_INPLACE = $$(INPLACE_BIN)/$$($1_$2_PROG) ...
build-prog is called with
dist for the
first two arguments, after expansion make would see this:
utils/hsc2hs_dist_INPLACE = $(INPLACE_BIN)/$(utils/hsc2hs_dist_PROG)
The idiom of
$$($1_$2_VAR) is very common throughout the build
system - get used to reading it! Note that the only time we use a
$ in the body of
define is to refer to the parameters
$2, and so on.
Idiom: phase ordering
NB. you need to understand this section if either (a) you are modifying parts of the build system that include automatically-generated
Makefile code, or (b) you need to understand why we have a top-level
Makefile that recursively invokes make.
The main hitch with non-recursive make arises when parts of the build system are automatically-generated. The automatically-generated parts of our build system fall into two main categories:
Dependencies: we use
ghc -Mto generate make-dependencies for Haskell source files, and similarly
gcc -Mto do the same for C files. The dependencies are normally generated into a file
.depend, which is included as normal.
Makefile binding generated from
.cabalpackage descriptions. See "Idiom: interaction with Cabal".
Now, we also want to be able to use
make to build these files, since
they have complex dependencies themselves. For example, in order to build
package-data.mk we need to first build
ghc-cabal etc.; similarly,
.depend file needs to be re-generated if any of the source files have changed.
GNU make has a clever strategy for handling this kind of scenario. It first reads all the included Makefiles, and then tries to build each one if it is out-of-date, using the rules in the Makefiles themselves. When it has brought all the included Makefiles up-to-date, it restarts itself to read the newly-generated Makefiles.
This works fine, unless there are dependencies between the
Makefiles. For example in the GHC build, the
.depend file for a
package cannot be generated until
package-data.mk has been generated
and make has been restarted to read in its contents, because it is the
package-data.mk file that tells us which modules are in the package.
But make always makes all the included
Makefiles before restarting - it
doesn't know how to restart itself earlier when there is a dependency
Consider the following Makefile:
all : include inc1.mk inc1.mk : Makefile echo "X = C" >$@ include inc2.mk inc2.mk : inc1.mk echo "Y = $(X)" >$@
Now try it:
$ make -f fail.mk fail.mk:3: inc1.mk: No such file or directory fail.mk:8: inc2.mk: No such file or directory echo "X = C" >inc1.mk echo "Y = " >inc2.mk make: Nothing to be done for `all'.
make built both
inc2.mk without restarting itself
between the two (even though we added a dependency on
The solution we adopt in the GHC build system is as follows. We have two Makefiles, the first a wrapper around the second.
# top-level Makefile % : $(MAKE) -f inc.mk PHASE=0 just-makefiles $(MAKE) -f inc.mk $<
# inc.mk include inc1.mk ifeq "$(PHASE)" "0" inc1.mk : inc.mk echo "X = C" >$@ else include inc2.mk inc2.mk : inc1.mk echo "Y = $(X)" >$@ endif just-makefiles: @: # do nothing clean : rm -f inc1.mk inc2.mk
Each time make is invoked, we recursively invoke make in several phases:
Phase 0: invoke
PHASE=0. This brings
inc1.mkup-to-date (and only
Final phase: invoke
PHASEunset). Now we can be sure that
inc1.mkis up-to-date and proceed to generate
If this changes
inc2.mk, then make automatically re-invokes itself, repeating the final phase.
We could instead have abandoned make's automatic re-invocation mechanism altogether, and used three explicit phases (0, 1, and final), but in practice it's very convenient to use the automatic re-invocation when there are no problematic dependencies.
Note that the
inc1.mk rule is only enabled in phase 0, so that if we accidentally call
inc.mk without first performing phase 0, we will either get a failure (if
inc1.mk doesn't exist), or otherwise make will not update
inc1.mk if it is out-of-date.
In the case of the GHC build system we need 4 such phases, see the
comments in the top-level
ghc.mk for details.
This approach is not at all pretty, and re-invoking make every time is slow, but we don't know of a better workaround for this problem.
Idiom: no double-colon rules
Make has a special type of rule of the form
target :: prerequisites,
with the behaviour that all double-colon rules for a given target are
executed if the target needs to be rebuilt. This style was popular
for things like "all" and "clean" targets in the past, but it's not
really necessary - see the "all" idiom above - and this means there's one fewer makeism you need to know about.
Idiom: the vanilla way
Libraries can be built in several different "ways", for example "profiling" and "dynamic" are two ways. Each way has a short tag associated with it; "p" and "dyn" are the tags for profiling and dynamic respectively. In previous GHC build systems, the "normal" way didn't have a name, it was just always built. Now we explicitly call it the "vanilla" way and use the tag "v" to refer to it.
This means that the
GhcLibWays variable, which lists the ways in
which the libraries are built, must include "v" if you want the
vanilla way to be built (this is included in the default setup, of
make has a rather ad-hoc approach to whitespace. Most of the time it ignores it, e.g.
FOO = bar
" bar". However, sometimes whitespace is significant,
and calling macros is one example. For example, we used to have a call
$(call all-target, $$($1_$2_INPLACE))
and this passed
" $$($1_$2_INPLACE)" as the argument to
all-target. This in turn generated
.PHONY: all_ inplace/bin/ghc-asm
which caused an infinite loop, as make continually thought that
ghc-asm was out-of-date, rebuilt it,
reinvoked make, and then thought it was out of date again.
The moral of the story is, avoid white space unless you're sure it'll be OK!