Skip to content

Draft: Separate C-- building in general from STG -> C-- in particular

John Ericson requested to merge wip/cmm-parser-no-stg into master

This avoids partiality in C-- parser, which had to temporarily be done in !8160 (merged) because these two were note separated.

There is further cleanup I can do

  • Lots of code that is not longer StgToCmm-specific should be moved to GHC.Cmm.Builder.* modules
  • @AndreasK originally proposed Gen not Builder. That does seem used more so I should rename to it.
  • New Classes can be shuffled around

However I wanted to open this up new because of the new classes, to discuss that approach in general.

Starting in https://discourse.haskell.org/t/modularizing-ghc-paper/4471/6, @simonpj and us paper authors discussed how breaking up configuration records is just the beginning, not the end of modularity, and the https://en.wikipedia.org/wiki/Law_of_Demeter ideal (and normal form!) is quite a bit more fine-grained. The paper didn't discuss other things we could do much, preferring to focus on the easier and less controversial bits, but now that we are discussing in the discourse, it might good to think about concrete strategies in code.

In this PR I did split out a CmmBuilderConfig from StgToCmmConfig, but I didn't stop there. If one looks at CmmBuilderConfig, they will notice that it contains a rather arbitrary subset of the options. Indeed, many of those options are not needed for parsing proper, but compiler/GHC/Cmm/Parser.stmtMacros which one might describe a "standard library" of magic procedural C-- macros rather than part of the C-- syntax proper. The arbitrariness of the configuration matches the arbitriness of "baking in" some macros vs having a macro system.

Of course, trying to make arbitrary user-defined C-- procedural macros would be a ton of work, for little practical gain. However, making these Contains* classes has a very similar benefit in modularizing the implementation the C-- builder code so that functions' types show just what configuration options they need. In this manner, we can separate ticky, profiling, and even platform-awareness from the "base" monad, which need not know about these things.

Again, to be clear, none of this is needed for the original task of getting rid of the partiality. We could go and get rid of these classes, monomorphizing these functions and making some sort of lift function instead, like is done with GHC.Cmm.Builder.ExtCode in fact. I do know in the past folks including @simonpj has been wary of adding type parameters and otherwise making type signatures more complicated.

However, now that we have discussed the dual problem: making a gazillion nominal record types to match the various subsets of configuration options that are needed adds even more boilerplate than fancier types, and is also quite annoying. I think the effective compromise here of Contains* tricks (or other such type system-leveraging tricks within "components") but plain old monomorphic records per component, is a good compromise. (Putting all the ticky options together vs tricking the usage of each one individually also felt prudent and was a relief to write :).)

So, at the risk of doing work that others will want undone, I decided the time was ripe for this conversation, and having a worked out example in terms of a MR that is not finished but does type check was the way to facilitate it.

Part of #17957

Edited by John Ericson

Merge request reports