After the source program has been typechecked it is desugared into GHC's intermediate language Core. The Core representation of a program is then optimized by a series of correctness preserving Core-to-Core passes. This page describes the overall structure of the Core-to-Core optimization pipeline. Detailed descriptions of optimizations are available in the published papers. An overview of the whole compiler pipeline is available here.
Optimizations during desugaring
At the end of desugaring we run the simpleOptPgm function that performs some simple optimizations: eliminating dead bindings, and inlining non-recursive bindings that are used only once or where the RHS is trivial. The rest of Core optimisations is performed by the Core-to-Core pipeline.
The structure of the Core-to-Core pipeline is determined in the getCoreToDo function in the compiler/simplCore/SimplCore.hs module. Below is an ordered list of performed optimisations. These are enabled by default with -O1 and -O2 unless the description says a specific flag is required. The simplifier, which the pipeline description below often refers to, is described in detail in the next section.
Static Argument Transformation: tries to remove redundant arguments to recursive calls, turning them into free variables in those calls. Only enabled with -fstatic-argument-transformation. If run this pass is preceded with a "gentle" run of the simplifier.
Full laziness, 1st pass: floats let-bindings outside of lambdas. This pass includes annotating bindings with level information and then running the float-out pass. In this first pass of the full laziness we don't float partial applications and bindings that contain free variables - this will be done by the second pass later in the pipeline. See "Further Reading" section below for pointers where to find the description of the full laziness algorithm.
Simplifier, main run: run the main passes of the simplifier (phases 2, 1 and 0). Phase 0 is run with at least 3 iterations.
Float in, 1st pass: the opposite of full laziness, this pass floats let-bindings as close to their use sites as possible. It will not undo the full laziness by sinking bindings inside a lambda, unless the lambda is one-shot. At this stage we have not yet run the demand analysis, so we only have demand information for things that we imported.
Call arity: attempts to eta-expand local functions based on how they are used. If run, this pass is followed by a 0 phase of the simplifier. See Notes in compiler/simplCore/CallArity.hs and the relevant paper.
Demand analysis, 1st pass (a.k.a. strictness analysis): runs the demand analyser followed by worker-wrapper transformation (JFP paper) and 0 phase of the simplifier. This pass tries to determine if some expressions are certain to be used and whether they will be used once or many times (cardinality analysis). We currently don't have means of saying that a binding is certain to be used many times. We can only determine that it is certain to be one-shot (ie. used only once) or probable to be one shot. Demand analysis pass only annotates Core with strictness information. This information is later used by worker/wrapper pass to perform transformations. CPR analysis is also done during demand analysis.
Full laziness, 2nd pass: another full-laziness pass. This time partial applications and functions with free variables are floated out.
Common Sub-expression-elimination: eliminates expressions that are identical.
Float in, 2nd pass
Check rules, 1st pass: this pass is not for optimisation but for troubleshooting the rules. It is only enabled with -frule-check flag that accepts a string pattern. This pass looks for rules beginning with that string pattern that could have fired but didn't and prints them to stdout.
Liberate case: unrolls recursive functions once in their own RHS, to avoid repeated case analysis of free variables. It's a bit like the call-pattern specialisation but for free variables rather than arguments. Followed by a phase 0 simplifier run. Only enabled with -fliberate-case flag.
Call-pattern specialisation: Only enabled with -fspec-constr flag. 'Inlining for recursive functions': Specialises recursive functions for call patterns involving arguments in WHNF, so that the simplifier can weed out emerging redexes in the body. Details are in the paper Call-pattern Specialisation for Haskell Programs
Check rules, 2nd pass
Simplifier, final: final 0 phase of the simplifier.
Demand analysis, 2nd pass (a.k.a. late demand analysis): this pass consists of demand analysis followed by the worker-wrapper transformation and phase 0 of the simplifier. The reason for this pass is that some opportunities for discovering strictness were not visible earlier; and optimisations like call-pattern specialisation can create functions with unused arguments which are eliminated by late demand analysis. Only run with -flate-dmd-anal.
Demand analysis, final pass A final pass of the demand analyser, not followed by worker-wrapper. This one is always run if there was any previous run of the demand analyser: As mentioned in the cardinality paper this is meant to detect single entry thunks. Another round of WW would invalidate that information.
Check rules, 3rd pass
The plugin mechanism allows to modify the above pipeline dynamically.
Simplifier is the workhorse of the Core-to-Core optimisation pipeline. It performs all the local transformations: (TODO this list is most likely not comprehensive)
applying the rewrite rules
case of case
case of known constructor
eta expansion and eta reduction
combining adjacent casts
pushing a cast out of the way of an application e.g. (f |> k) a ==> f (a |> k1) |> k2 for suitable k1, k2.
Each run of the simplifier is assigned with a phase number: 2, 1 or 0. Phase numbers are used for control of interaction between the rules and INLINE/NOINLINE pragmas - see sections 126.96.36.199 and 9.32.3 of the user guide. There are many 0 phases because we use the simplifier to propagate the effects of other passes and once we've gone through the main runs of the simplifier we no longer have to worry about rules and inlinings - these have been already dealt with. There is also a special initial phase, which is used at the beginning of the pipeline by the "gentle" simplifier runs.
Each single run of the simplifier consists of many iterations. Each iteration tries to apply all the optimisations. The simplifier run ends when a fixpoint is reached or when a predesignated number of iterations has been performed (the default is 4 and it can be controlled via the -fmax-simplifier-iterations flag.
Simplifier settings can be further tweaked by modifying the fields of SimplifierMode record. It is possible to turn on and off the following optimisations:
The so-called "gentle" run disables the case-of-case transformation but does inlining, applies rules and does and eta-expansion. The reason it disables case-of-case is that it makes full laziness work better. FIXME that's what the comment says, and yet we run the full-laziness after the main run of the simplifier.
Even when we disable inlining bindings which only occur once in the body can still be inlined.
You can use -dverbose-core2core flag to dump the Core after almost all passes in the pipeline.
Compilation by Transformation in Non-Strict Functional Languages (a.k.a. the Santos' thesis): basic source of information about core-to-core transformations. Describes local transformations, full laziness, static argument transformation and many others. Note: Santos' description of inlining is superseeded by "Secrets of the GHC inliner" (see below).
A transformation-based optimiser for Haskell, SL Peyton Jones and A Santos, Science of Computer Programming 32(1-3), pp3-47, September 1998. Gives a very simplified overview of Core and then goes into discussing: inlining, veta reduction, optimizing conditionals, unboxed data types and strictness analysis, and let floating (including full laziness). Again, this paper repeats the Santos' thesis, except for the unboxed data types description, where it repeats "Unboxed values as first class citizens in a non-strict functional language".
Secrets of the GHC inliner, Simon Peyton Jones and Simon Marlow, Journal of Functional Programming 12(4), July 2002, pp393-434. As the name of the paper rightly implies, this paper gives full details of the inliner.
Constructed Product Result Analysis for Haskell, Clem Baker-Finch, Kevin Glynn, and Simon Peyton Jones, Journal of Functional Programming 14(2), 211–245, March 2004. Describes optimisation that allows to return tuple components in registers (for functions that return tuples).
Call arity, by Joachim Breitner. Describes the call arity optimisation. Also see his thesis for even more details.
Does handling of unboxed values described in "Unboxed values as first class citizens in a non-strict functional language" fit in the scope of this page?