Skip to content

Content-hash-based incremental builds

hdgarrood requested to merge hdgarrood/ghc:content-hash-incremental-build into master

Resolves #16495 (closed), closes #19439 (closed).

This work was sponsored by Lumi.

I'm editing this description because this MR has changed over time and the original description is no longer particularly helpful. The original description is preserved below.

This MR updates the interface file format and the ModSummary data type to add a field which contains the MD5 hash of the relevant .hs source file, to be used for recompilation checking. The ms_hs_date field has also been removed from the ModSummary data type, so that modification dates of source files no longer influence recompilation checking. Note that modification times of output files do still influence recompilation checking - but I'm taking the view that that's ok, because in practice these modification times are much less likely to lead to the recompilation checker arriving at incorrect conclusions. See the new Note [When source is considered modified] for more information about how recompilation checking works with this patch.


Original MR description

Right now, this MR checks for "stability" (i.e. whether we can reuse existing build products) based only on content hashes, because anything else felt too ambitious for me at this stage. It's very WIP, for a number of reasons:

  • I still need to test how this affects performance on no-op builds (when everything is up to date)
    • If it turns out that this approach is acceptably fast and we all agree that having this check use only a content hash is an acceptable approach, I should revert the change which adds a new mi_src_mtime field to ModIfaceBackend. If that field is staying, I should make sure GHC respects SOURCE_EPOCH_TIME when populating this field, for reproducible builds.
    • If we do want to go ahead with a hash-only approach, it may be worth considering removing the ms_hs_date field from ModSummary as well.
  • The comment describing stableObject doesn't match what the code is actually doing right now
  • I am probably doing too much work trying to read the iface in Driver/Make when all I want is the prev_hash. @bgamari mentioned a possibility of rearranging the data in the iface file; I would appreciate some guidance here if possible.
  • It is probably worth moving checkStability into IO so that we don't bother reading the iface for a module if we already know that one of its imports is unstable.
  • I need to revert the change where the hiVersion is overridden with a constant value (I couldn't work out how else to get GHC to avoid attempting to read iface files of the old format, which was causing panics)
  • There's probably a fair bit of general tidying up required
    • I've committed a new file which is essentially a stream of consciousness to stop myself from getting lost; of course this should be removed at some point
  • This has only received very basic testing so far

In my (very basic) testing, ghc --make does at least appear to do more or less what I want it to:

harry@chaffinch: /home/harry/code/ghc-make-test
$ ghc-stage2 --make Main
[2 of 2] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
harry@chaffinch: /home/harry/code/ghc-make-test
$ touch Main.hs
harry@chaffinch: /home/harry/code/ghc-make-test
$ ghc-stage2 --make Main
harry@chaffinch: /home/harry/code/ghc-make-test
$ 

If possible, I'd appreciate a little feedback on what I have so far. Other than what I've written above, does the approach make sense? Is there anything else I'm missing?

Edited by hdgarrood

Merge request reports