Recompilation avoidance and Backpack
Today, recompilation avoidance is centered around two major mechanisms:
- First, we keep track of entities we *use* (
tcg_dus), which is done by reading off all external names from the renamed source code of a Haskell source file.
- Second, we keep track of what we *import* (
tcg_imports), which tracked when we rename imports.
These two pieces of information get assembled into a module-indexed series of usages in
mk_mod_usage_info. The general idea is that when an entity is used, we must record the hash of the entity; when a module is imported, we must record its export hash.
There is an implicit assumption here, which is that a (direct) import is the only way we depend on the exports of a module, and an occurrence of a name in the renamed syntax is the only way we depend on an actual entity.
Backpack breaks these assumptions:
- When we perform signature merging, we depend on the exports and entities of each of the signatures we merge in. Furthermore, it is important to distinguish each of these by identity module (not semantic module, which collapses the distinction.)
- When we instantiate a module, we depend on the exports and entities of the implementing module.
When I initially implemented Backpack, I slowly added extra information to fix recompilation problems as I noticed them. I thus accreted the following recompilation avoidance mechanisms:
- When signature merging occurs, we specially record the module hash for each used merge requirement as a special new field
UsageMergedRequirement, and recomp if the module hash changed at all. We also add each merged signature to ImportAvails (but not as an "import") to ensure we pick up family instances.
- When we instantiate a module, we treat it as if we had a direct import of it (not yet merged, in https://phabricator.haskell.org/D3381). Since instantiations are always referencing non-local modules, we'll always record a module hash in such cases.
This is quite a hodgepodge, and I have no confidence that it is correct. For example, if an implementing module reexports an entity from another module, and that original entity changes, I doubt we recompile at this point. We "accidentally" handle the case when it's not a reexport because we record the module hash for the entire instantiating module.
It seems that it would be better if we can recast this in terms of imports and usages. Here is a try at the design:
- In both instantiation and merging, we must record the export hash of the modules we instantiated/merged in. It is a little troublesome to think of these as imports, however, because they're not (and if you try to implement this, you find yourself making a fake ImportedModVal for an import that doesn't exist); I think the correct thing here is to introduce a new notion of dependency for things that don't correspond to source level imports (another possibility is to add another constructor to ImportedModVal but the effect of this on existing code would have to be determined.)
- The usages when we instantiate a signature are the (instantiated) usages of the original signature (in particular, this picks up the usages from instance lookup), plus a usage for each entity that we match against (because we must rematch if the type changes.)
- Usages for signature merging are a little trickier. We want a usage for every entity that we end up merging in (so, we must record usages post thinning), BUT we must make sure the usage points at the identity module of the signature that originally provided it, not the semantic module (which will invariably point to the current module under compilation.)
One more thing: when we instantiate a module on-the-fly, we need to account for how we instantiated it (to put it differently, the recompilation information we compute when we do on-the-fly should be the (morally) the same as what we would get if we actually compiled the modules in question. This is a bit troublesome since we don't have detailed information relating how a signature was instantiated and what we used (the on-the-fly instantiation process shortcuts this). The simplest thing is probably to just record the module hashes of each module that was used to instantiate an imported module (recursively); we might be able to do this even by just twiddling
mi_mod_hash hash when we instantiate (the alternative is to switch to recording InstalledModule/InstalledUnitId only in hashes, and augmenting usage information to also carry along instantiations.)
Another problem is that we record usages for Module (instantiated things), but hashes are actually on an InstalledModule basis.