When compiling a single module, we can assume that all of our dependencies have already been compiled, and query the environment as necessary when we need to do things like look up interfaces to find out what the types in our dependencies are. When we compile more than module at once, as in --make, things get a bit more complicated:
We have to analyze the dependency structure of the program in question, and come up with a plan for how to compile the various modules, and
We have an opportunity to cache and reuse information from interface files which we may load from the environment. This is why, for example, ghc --make outperforms parallel one-shot compilation on one core.
This discussion is going to omit concerns related to dynamic code loading in GHC (as would be the case in GHCi).
The overall driver
The meat of this logic is in compiler/main/GhcMake.hs, with primary entry point the function load (in the case of --make, this function is called with LoadAllTargets, instructing all target modules to be compiled, which is stored in hsc_targets).
Dependency analysis is carried out by the depanal function; the resulting ModuleGraph is stored into hsc_mod_graph. Essentially, this pass looks at all of the imports of the target modules (hsc_targets), and recursively pulls in all of their dependencies (stopping at package boundaries.) The resulting module graph consists of a list of ModSummary (defined in compiler/main/HscTypes.hs), which record various information about modules prior to compilation (recompilation checking, even), such as their module identity (the current package name plus the module name), whether or not the file is a boot file, where the source file lives. Dependency analysis inside GHC is often referred to as downsweep.
ToDo: say something about how hs-boot files are
The dependency analysis is cached (in hsc_mod_graph), so later calls to depanal can reuse this information. (This is not germane for --make, which only calls depanal once.) discardProg deletes this information entirely, while invalidateModSummaryCache simply "touches" the timestamp associated with the file so that we resummarize it.
The result of dependency analysis is topologically sorted in load by topSortModuleGraph.
Compilation, also known as upsweep, walks the module graph in topological order and compiles everything. Depending on whether or not we are doing parallel compilation, this implemented by upsweep or by parUpsweep. In this section, we'll talk about the sequential upsweep.
The key data structure which we are filling in as we perform compilation is the home package table or HPT (hsc_HPT, defined in compiler/main/HscTypes.hs). As its name suggests, it contains informations from the *home package*, i.e. the package we are currently compiling. Its entries, HomeModInfo, contain the sum total knowledge of a module after compilation: both its pre-linking interface ModIface as well as the post-linking details ModDetails.
We *clear* out the home package table in the session (for --make, this was empty anyway), but we pass in the old HPT.
When we finish compiling a module loop, we need to retypecheck all of the interfaces in the loop. The details for why we need to do this are in Tying the Knot.
Finally, when the module is completely done being compiled, it is registered in the home package table
ToDo: Talk about what happens when we fail while in the middle of compiling a module cycle