The "re-typecheck" mechanism in GHC.Make looks wrong
Ticket #20200 (closed) has exposed some issues to do with the knot-tying logic in GHC.Make, in --make mode. #20200 (closed) has lots of issues, many of them now fixed; this particular issue starts here, but it's complicated enough that I'm opening a fresh ticket for it.
The wiki page Tying the knot is helpful. Also closely related are
I think the trouble we have been encountering in #20200 (closed) is subtle fallout from recent refactoring of the
--make code, because we didn't understand the issues well enough. Here is my analysis.
Running example
Suppose we have ("R" for "recursive"):
R.hs-boot: module R where
data T
g :: T -> T
A.hs: module A( f, T, g ) where
import {-# SOURCE #-} R
data S = MkS T
f :: T -> S = ...g...
R.hs: module R where
data T = T1 | T2 S
g = ...f...
Why we need to rehydrate A's ModIface before compiling R.hs
After compiling A.hs we'll have a TypeEnv in which the Id for f has a type type uses the
AbstractTyCon T; and a TyCon for S that also mentions that same AbstractTyCon.
(Abstract because it came from R.hs-boot; we know nothing about it.)
When compiling R.hs, we build a TyCon for T. But that TyCon mentions S, and it currently
has an AbstractTyCon for T inside it. But we want to build a fully cyclic structure, in which
S refers to T and T refers to S.
Solution: rehydration. Before compiling R.hs, rehydrate all the ModIfaces below it that depend on R.hs-boot. To rehydrate a ModIface, call typecheckIface to convert it to a ModDetails. It's just a de-serialisation step, no type inference, just lookups.
[Note: GHC calls rehydration "re-typechecking the interface", but we use "typecheck" so much that it seems better to have a new word. Rehydration suggests reconstituting the full ModDetails from the dried-out ModIface.]
Now S will be bound to a thunk that, when forced, will "see" the final binding for T; see Tying the knot. But note that this must be done before compiling R.hs.
Why we need to rehydrate A's ModIface after compiling R.hs
When compiling R.hs, the knot-tying stuff above will ensure that f's
unfolding mentions the LocalId for g. But when we finish R, we carefully ensure that all those
LocalIds are turned into completed GlobalIds, replete with unfoldings etc. Alas, that will not
apply to the occurrences of g in f's unfolding. And if we leave matters like that, they will
stay that way, and all subsequent modules that import A will see a crippled unfolding for f.
Solution: rehydrate both R and A's ModIface together, right after completing R.hs.
Do we really need to eagerly re-typecheck so many modules?
At the moment I think the rehydration step eagerly enumerates the modules and re-typechecks them
typecheckLoop hsc_env hmis = do
...
mds <- initIfaceLoad new_hsc_env $
mapM (typecheckIface . hm_iface) hmis
(Admittedly, typecheckIface just populates a type environmen with thunks, but it's still work.)
An interesting alternative is instead simply de-populate the HPT. In one-shot mode, if we try to
look up an identifier from M and M isn't loaded, we look for M.hi and load it. But instead of
"look for M.hi and load it" we could "look for M's ModIface, and rehydrate it". In effect, those
typecheckIface calls become lazy. And, especially with -O0, many of the modules in a big loop
might never be visited at all, which would save work.
Which modules to rehydrate
We only need rehydrate modules that are
- Below R.hs
- Above R.hs-boot
There might be many unrelated modules (in the home package) that don't need to be rehydrated. Do we take advantage of this?
Modules "above" the loop
This dark corner is the subject of #14092.
Suppose we add to our example
X.hs module X where
import A
data XT = MkX T
fx = ...g...
If in --make we compile R.hs-boot, then A.hs, then X.hs, we'll get a ModDetails for X that has an AbstractTyCon for T in the the argument type of MkX. So:
- Either we should delay compiling X until after R has beeen compiled.
- Or we should rehydrate X after compiling R -- because it transitively depends on R.hs-boot.
Question: in one-shot mode today, when we compile X, how does X know to look for R.hi-boot? Answer (we think) findInstalledHomeModule looks for R.hi, then R.hi-boot. But what if it finds an out-of-date R.hi?
Conclusion
It's pretty uncomfortable to rehydrate all those modules twice; but
- It's what happens in one-shot mode
- I don't see an easier solution
I can just about imagine not doing the post-compilation rehydrate stage. Instead, accept that f may mention a somewhat bogus g, and when inlining look up g in the symbol table. But this does the job once per inlining, rather than
once for all.
I think the current code does the rehydrate step once per SCC, and that seems entirely wrong.