Hadrian: Building Stage2 GHC is a Huge Cache Miss

added hadrian label

@mpickering Thanks for pointing out this issue to me. @snowleopard, your input here would be appreciated.

I think (drumroll...) we should be doing less with stage 2! I rather optimistically just run stage 1 + tests than optimistically freeze stage 1.

Testing is embarrassingly parallel, and stage 1 can be done super incrementally. But stage 2 will always be fatally slow. And it's not even a good way to debug stuff because unit tests end up becoming system tests since bootstrapping allows breakages to propagate widely.

The best case is a "two step CI". PRs pass stage 1 + tests and land in a branch, (I'll call it develop.) Then some marge-bot like thing takes a few commits of develop and tries to land it in master. If something goes wrong it can bisect. Crucially, stage 1 + tests should guarantee correctness most of the time, so PRs should usually base themselves off develop not master so as to increase throughput (via pipelining).

@DavidEichmann Thanks! Let me also tag @ndmitchell.

Freezing intermediate, e.g. nightly, versions of Stage1 GHC is an interesting solution. I guess there are two potential problems:

Outdated Stage1 GHC might fail to build a fresh commit or fail to pass the tests, which would require a manual intervention. One possible way of making such manual interventions possible is to rely on some kind of a keyword in commit message, e.g. [unfreeze], which would instruct the CI script to omit the --freeze1 trick. Hopefully, this will be relatively rare.
A less likely problem: The resulting GHC somehow passes the tests even though it shouldn't. I'm not entirely sure this is actually possible, but in this case presumably we'll catch the issue during the nightly full build.

@Ericson2314's suggestion is good too: we could be running at least a part of the testsuite on Stage1 compiler. I'm not sure how well this will work in practice though.

In theory we support running the stage 1 tests, with build test --test-compiler=stage1. But I haven't yet made stage 1 testing part of ./validate so no guarantee that this works yet.

changed the description

@snowleopard That's a good point. Using the previous days stage1 is perhaps a bit risky and unstable. We could instead use the latest release's stage1. Then we can be more confident that stage1 will work as expected. We can even use a git tag to select the exact release commit we want, then if for whatever reason we need to bump it, any maintainer can just update the git tag.

@Ericson2314, I hadn't considered testing with stage1. That will certainly be much faster, though I'm worried that it is not sufficient as testing with stage1 will skip many tests. We could run perhaps run more tests with stage1 and less tests with stage2, but we'd still have to build stage2 so I doubt we'd save time here. Am I missing something?

I rather optimistically just run stage 1 + tests than optimistically freeze stage 1.

I'm not sure about this. In the stage1+tests case you must assume that stage0 correctly builds stage1. In the freeze1 case you must assume stage1 correctly builds stage2. If we use stage1 from a recent ghc release commit, do you think the first stage1+tests is still safer than freeze1? I admittedly have not experimented much with --freeze1.

marked the checklist item TODO Once cached builds are working somewhat reliably, confirm and quantify the performance impact of this issue. as completed

marked the checklist item TODO Once cached builds are working somewhat reliably, confirm and quantify the performance impact of this issue. as incomplete

As testing with stage1 will skip many tests.

This would be GHCi-related tests and others that require same ABI?

We could run perhaps run more tests with stage1 and less tests with stage2, but we'd still have to build stage2 so I doubt we'd save time here. Am I missing something?

On further thought, right, it's pretty similar building build + test stage 2 with frozen stage 1, and build + test stage 1 from released GHC. They are pretty comparable in the "amount" of correctness they measure, just different sorts of correctness with fuzzy trade offs. Both are low latency which is what counts. Maybe just do both in parallel! That won't impact latency.

The two branches thing however I think is key; the more important part of what I said. PRs landing in develop have only 1 stage "deep" built and tested, be it stage 1, stage 2, or both from releases/frozen old So things should get in develop very quickly because of instrumentality for building and paralellism for testing. Then, in larger rollups, develop gets put into master. The huge rollups make up for the fact that this is a fundamentally slow build + test cycle.

Yes this opens the possible that develop gets too far ahead in a broken state that only the rollup merge into master will notice, but I'm willing to gamble that develop won't break very often. Relatedly, I'm very interested in things like https://github.com/ghc-proposals/ghc-proposals/issues/162 that aim to make stage 1 be able to do everything stage 2 can do, even if it can't do it in all the ways stage 2 can. The benefit both applies when the frozen stage 1 is too different or release is too different. (iserv already allows for that except that's stage 2 iserv which is too slow to build).

Fundamentally we have two goals: tiny latency in happy path, and race-free correctness of master (as famously laid out in https://graydon2.dreamwidth.org/1597.html). Having two branches is sort of a baby-cutting trick like generations for GC. ("Most objects die young." "Most bugs are caught in one stage.")

@phadej reminds me we already have more extensive nightly builds to catch bigger breakages. I suppose I would still like to see a nightly-passing only branch (that would be master and master today develop) so the "not rocket science rule" is preserved. But at least in practical terms we can first make CI faster and then fix the nightly situation---until we land significantly more PRs per day the probability of accumulating lots breakage is nil.

marked this issue as related to #16926

Hadrian: Building Stage2 GHC is a Huge Cache Miss

The problem

Details

Possible Solutions

Effect on CI

Child items ...

Activity

Hadrian: Building Stage2 GHC is a Huge Cache Miss

The problem

Details

Possible Solutions

Effect on CI

Relates to

Activity