Modularity for the Core optimization passes and linter
This part of #17957, split off to its own issue to more thoroughly elaborate the design rather than just the general principles.
Code we have to organize
Right now, here is what there is:
- A few of Core -> Core optimizations passes
- Logic to lint Core
- "end pass" logic to do misc dumping and linting used by both the core -> core passes and other pipeline stages where the output is Core but the input might be something else.
-
CoreToDo
which is "mostly" an enum for each Core -> Core pass, but also the other uses of the end pass logic shoe-horned in. -
GHC.Core.Opt.Pipeline
, which contains a mix of driver and non-driver code-
getCoreToDo :: Logger -> DynFlags -> [CoreToDo]
DynFlags
-
runCorePasses :: [CoreToDo] -> ModGuts -> CoreM ModGuts
ModGuts
-
Issues with the code before we started
DynFlags
and HscEnv
used all over the place
This is a classic issue very much not distinctive to this part of the compiler, discussed at length in the paper.
Domain-layer linting logic is mixed with application-layer / presentation-layer "end pass logic"
The former is an essential part of GHC, defining what well-formed Core looks like. The latter is just a mere convention downstream of the definition of Core well-formedness
GHC.Core.Opt.Pipeline
is also a mix of application and above layer and domain logic because because it contains small bits of domain logic alone with driver logic.
CoreToDo
begets partiality
The variants that don't correspond to CoreToCore
passes cause panics if they are encountered in runCorePasses
. We are relying on getCoreToDo
never outputting them, and crossing fingers.
This is done because a CoreToDo
is required to use the end pass logic, and so the other non Core -> Core pass uses of the end pass logic need their own variants.
CoreToCore
Individual passes use The is a general software design principle that "things should not know their own names".
The purpose of CoreToCore
is to enumerate all of the Core -> Core passes, but this should be downstream of each pass so the pass by construction is only aware of itself and not how its used.
However, since the end-pass logic requires passing a CoreToCore
, each pass must import the CoreToCore
datatype in order to pass its own variant to the end pass logic.
This creates and anti-modular confusion where CoreToCore
, which represents the downstream code using all the Core -> Core passes, is actually defined upstream of each pass.
New design addressing issues
Each Core -> Core pass should stand alone
Every Core -> Core pass should be considered its own component, and thus receive the general treatment described in the paper.
To recap:
- Rather than taking
DynFlags
, it should take a configuration record with just the items this component cares about.- Module
GHC.Foo
has a companion moduleGHC.Driver.Config.Foo
which translateDynFlags
to the per-component configuration record purely.
- Module
- Rather than taking
HscEnv
, it should just take the "services" that it cares about. - The paper recommended that each service be passed separately but where there are concerns of about state getting out of sync, we are happy to compromise and make a smaller version of
HscEnv
bundling the services together. - We could make a
GHC.Driver.Env.Foo
module for theHscEnv
to mini-HscEnv
translation.
Note that we shouldn't confused external configuration with internal state.
In !8598 (diffs, comment 441105), @simonpj asks why a new configuration record is being added. The answer is that the existing SimplTopEnv
is used in the definition of a monad internal to the simplifier. It would be a regression in modularity to force the caller of the simplifier to be aware of this monad when it was previously a mere implementation detail.
By creating a new record with just the parts of the monad env that are set by the outside world, we preserve the encapsulation boundary allowing the outside world to continue to not need to know/care about information it isn't responsible for providing.
SPJ: OK but let's articulate that logic explicitly with the data types. There are rather a lot of them now.
GHC.Core.Lint
in to GHC.Core.EndPass
The end-pass logic is split out from Domain-layer- and application-layer- logic is thus no longer mixed together.
Both are treated as components and get per-component configuration records accordingly.
CoreToDo
s. And knock on effects.
Neither the end-pass logic nor linting logic should take any CoreToDo
is expunged from their nascent configure records. This enables these additional problem-resolving changes:
-
The individual passes which now provide end pass configs also don't need to use
CoreToDo
either, and can instead directly pass the metadata which was dispatched from theCoreToDo
directly. -
Passes that are not Core -> Core passes can be purged from
CoreToDo
. Those passes too skip usingCoreToDo
to provide information toEndPass
. Those removed usages were the sole use-case of those improperCoreToDo
variants, so now we remove them. This removes the partiality inGHC.Driver.Pipeline
. -
CoreToDo
need no longer be imported by any individual pass, but just be the logic putting everything together.
CoreM
to the plugin use-case.
Confine CoreM
currently takes/provides a lot of information. This is overkill for the regular built-in Core-> Core passes that use it, and antithetical to our genral approach of modularizing components.
However, plugins are a different, special case. We do no know what plugins do ahead of time, and so to be generous we wish to provide them with as much information as possible. That means we should continue to give plugins the "kitchen sink" CoreM
.
That means we get rid of CoreM
everywhere else besides the plugins. It should be use by the plugins and the driver code which calls the plugins only.
Plan more completely
CoreToDo
variants should also contain the new configuration records of the pass they refer to. Then getCoreToDo
can more completely plan out what needs to be done by elaborating DynFlags
into both the list of passes to be run (in order) and the configuration for each pass. This is the complete Core -> Core plan.
GHC.Core.Opt.Pipeline
and rename
Split The non-driver bits of domain logic should be given their own homes in other module(s). The remain driver logic (including the principle functions we've mentioned in this issue) will remain, and so the module should renamed GHC.Driver.Core.Opt
.
This will uphold our convention (to eventually be enforced by lint), that DynFlags
and HscEnv
should only appear in GHC.Driver
modules.
Tasks
-
Write a note "Configuration records in GHC". Related discussion: !8598 (comment 443781) -
Write a note "The GHC.Driver.Config namespace". Related discussion: !8598 (comment 443963)
References
The parent issue: #17957
The paper: https://hsyl20.fr/home/posts/2022-05-03-modularizing-ghc-paper.html
The wiki page: https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular