Skip to content

Modularity for the Core optimization passes and linter

This part of #17957, split off to its own issue to more thoroughly elaborate the design rather than just the general principles.


Code we have to organize

Right now, here is what there is:

  1. A few of Core -> Core optimizations passes
  2. Logic to lint Core
  3. "end pass" logic to do misc dumping and linting used by both the core -> core passes and other pipeline stages where the output is Core but the input might be something else.
  4. CoreToDo which is "mostly" an enum for each Core -> Core pass, but also the other uses of the end pass logic shoe-horned in.
  5. GHC.Core.Opt.Pipeline, which contains a mix of driver and non-driver code
    • getCoreToDo :: Logger -> DynFlags -> [CoreToDo]
      which plans which Core -> Core passes should be run based on DynFlags
    • runCorePasses :: [CoreToDo] -> ModGuts -> CoreM ModGuts
      which executes the plan, making a modified ModGuts

Issues with the code before we started

DynFlags and HscEnv used all over the place

This is a classic issue very much not distinctive to this part of the compiler, discussed at length in the paper.

Domain-layer linting logic is mixed with application-layer / presentation-layer "end pass logic"

The former is an essential part of GHC, defining what well-formed Core looks like. The latter is just a mere convention downstream of the definition of Core well-formedness

GHC.Core.Opt.Pipeline is also a mix of application and above layer and domain logic because because it contains small bits of domain logic alone with driver logic.

CoreToDo begets partiality

The variants that don't correspond to CoreToCore passes cause panics if they are encountered in runCorePasses. We are relying on getCoreToDo never outputting them, and crossing fingers.

This is done because a CoreToDo is required to use the end pass logic, and so the other non Core -> Core pass uses of the end pass logic need their own variants.

Individual passes use CoreToCore

The is a general software design principle that "things should not know their own names".

The purpose of CoreToCore is to enumerate all of the Core -> Core passes, but this should be downstream of each pass so the pass by construction is only aware of itself and not how its used.

However, since the end-pass logic requires passing a CoreToCore, each pass must import the CoreToCore datatype in order to pass its own variant to the end pass logic.

This creates and anti-modular confusion where CoreToCore, which represents the downstream code using all the Core -> Core passes, is actually defined upstream of each pass.

New design addressing issues

Each Core -> Core pass should stand alone

Every Core -> Core pass should be considered its own component, and thus receive the general treatment described in the paper.

To recap:

  • Rather than taking DynFlags, it should take a configuration record with just the items this component cares about.
    • Module GHC.Foo has a companion module GHC.Driver.Config.Foo which translate DynFlags to the per-component configuration record purely.
  • Rather than taking HscEnv, it should just take the "services" that it cares about.
  • The paper recommended that each service be passed separately but where there are concerns of about state getting out of sync, we are happy to compromise and make a smaller version of HscEnv bundling the services together.
  • We could make a GHC.Driver.Env.Foo module for the HscEnv to mini-HscEnv translation.

Note that we shouldn't confused external configuration with internal state.

In !8598 (diffs, comment 441105), @simonpj asks why a new configuration record is being added. The answer is that the existing SimplTopEnv is used in the definition of a monad internal to the simplifier. It would be a regression in modularity to force the caller of the simplifier to be aware of this monad when it was previously a mere implementation detail. By creating a new record with just the parts of the monad env that are set by the outside world, we preserve the encapsulation boundary allowing the outside world to continue to not need to know/care about information it isn't responsible for providing.

SPJ: OK but let's articulate that logic explicitly with the data types. There are rather a lot of them now.

The end-pass logic is split out from GHC.Core.Lint in to GHC.Core.EndPass

Domain-layer- and application-layer- logic is thus no longer mixed together.

Both are treated as components and get per-component configuration records accordingly.

Neither the end-pass logic nor linting logic should take any CoreToDos. And knock on effects.

CoreToDo is expunged from their nascent configure records. This enables these additional problem-resolving changes:

  1. The individual passes which now provide end pass configs also don't need to use CoreToDo either, and can instead directly pass the metadata which was dispatched from the CoreToDo directly.

  2. Passes that are not Core -> Core passes can be purged from CoreToDo. Those passes too skip using CoreToDo to provide information to EndPass. Those removed usages were the sole use-case of those improper CoreToDo variants, so now we remove them. This removes the partiality in GHC.Driver.Pipeline.

  3. CoreToDo need no longer be imported by any individual pass, but just be the logic putting everything together.

Confine CoreM to the plugin use-case.

CoreM currently takes/provides a lot of information. This is overkill for the regular built-in Core-> Core passes that use it, and antithetical to our genral approach of modularizing components.

However, plugins are a different, special case. We do no know what plugins do ahead of time, and so to be generous we wish to provide them with as much information as possible. That means we should continue to give plugins the "kitchen sink" CoreM.

That means we get rid of CoreM everywhere else besides the plugins. It should be use by the plugins and the driver code which calls the plugins only.

Plan more completely

CoreToDo variants should also contain the new configuration records of the pass they refer to. Then getCoreToDo can more completely plan out what needs to be done by elaborating DynFlags into both the list of passes to be run (in order) and the configuration for each pass. This is the complete Core -> Core plan.

Split GHC.Core.Opt.Pipeline and rename

The non-driver bits of domain logic should be given their own homes in other module(s). The remain driver logic (including the principle functions we've mentioned in this issue) will remain, and so the module should renamed GHC.Driver.Core.Opt.

This will uphold our convention (to eventually be enforced by lint), that DynFlags and HscEnv should only appear in GHC.Driver modules.

Tasks

References

The parent issue: #17957

The paper: https://hsyl20.fr/home/posts/2022-05-03-modularizing-ghc-paper.html

The wiki page: https://gitlab.haskell.org/ghc/ghc/-/wikis/Make-GHC-codebase-more-modular

Edited by Dominik Peteler
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information