Skip to content

Decouple DynFlags in StgToCmm

What

Continuing to decouple DynFlags, see #20730 for project plan and status.

Add two modules:

  • StgToCmm.Config
  • Driver.Config.StgToCmm

Removes 3 types in StgToCmm since they are now made redundant:

  • CgInfoDownwards
  • PtrOpts
  • CallOpts

Refactors call sites to DynFlags in StgToCmm with calls to StgToCmmConfig or when the function does not have access to StgToCmmConfig the MR breaks functions down such that their types only require what is used in the body of the function.

Turns out DynFlags was pretty tightly coupled to this phase and so this MR touches quite a bit of code. Module in StgToCmmConfig is now the only reference to the module being compiled. That is, I decided this Module is the _single source of truth_^{TM}. Previously there was a Module passed around in function parameters and in DynFlags. Thus several APIs have changed to enforce that design, which IMHO is a good thing.

Other things

You'll notice that UpdFrameOffset and Module are lazy in StgToCmmConfig. This is purposeful, if Module is strict the Cmm/Parser.y throws a panic, see line 1522 (the call to runC). This can be changed from no_module to this_mod but that breaks the reg_alloc_unit_tests and I didn't want to delay the MR to try to understand that test so I decided against making Module strict.

UpdFrameOffset only stores an Int but if that is made strict then the RTS slows noticeably down and deadlocks at StgToThnk.cmm. I did not investigate why this occurs opting instead to leave it lazy but with a comment.

I keep Sequel and SelfLoopInfo in StgToCmm/Config.hs module. This is also purposeful. Ideally these types would be placed in StgToCmm/Types.hs but doing so creates a gnarly cyclic dependency between Builtins and Cmm/Info.hs so I've opted against that. We could move this into its own module, but then that module is quite peculiar and doesn't conform to the architecture conventions of the rest of the compiler, i.e., the new module would only export the type and no actual functions for working with the type, because these functions are in StgToCmm/Monad.hs since they are monadic. I think this is best left to a future refactor that untangles the module dependencies in this phase (good newcomer issue!).

Lastly, I'm not keen on the naming convention I've used because it is long stgToCmm<foo>, perhaps stc<foo> would be better? I'd like a second opinion on this.

Final and smelly things

Note to my future self, when I (or someone) begins phase 3 of the project plan stated in #20730 (comment 397931): There is a great deal of opportunity for performance minded refactoring. For example, there are several places where we have runtime checks via guards and then that information is not passed downward into subsequent code. A good example of this is isDllConApp and isDynLinkName where both functions check the platform for OSMinGW32:

-- | Does this constructor application refer to anything in a different
-- *Windows* DLL?
-- If so, we can't allocate it statically
isDllConApp
  :: Platform
  -> Bool          -- is Opt_ExternalDynamicRefs enabled?
  -> Module
  -> DataCon
  -> [StgArg]
  -> Bool
isDllConApp platform ext_dyn_refs this_mod con args
 | not ext_dyn_refs    = False
 | platformOS platform == OSMinGW32
    = isDynLinkName platform this_mod (dataConName con) || any is_dll_arg args
 | otherwise = False
  where
    -- NB: typePrimRep1 is legit because any free variables won't have
    -- unlifted type (there are no unlifted things at top level)
    is_dll_arg :: StgArg -> Bool
    is_dll_arg (StgVarArg v) =  isAddrRep (typePrimRep1 (idType v))
                             && isDynLinkName platform this_mod (idName v)
    is_dll_arg _             = False

Notice that isDynLinkName is called only when platformOS platform is OSMinGW32. Now isDynLinkName is:

isDynLinkName :: Platform -> Module -> Name -> Bool
isDynLinkName platform this_mod name
  | Just mod <- nameModule_maybe name
...
    = case platformOS platform of
...
        OSMinGW32 -> moduleUnit mod /= moduleUnit this_mod

        -- For the other platforms, still perform the hack
        _         -> mod /= this_mod

  | otherwise = False  -- no, it is not even an external name

Which checks again even though this is already guaranteed. Note that isDynLinkName is used in only one other place: GHC.Cmm.CLabel.labelDynamic so we could remove/refactor it here if we want. But the example still stands; these kinds of checks are, in general, forgotten, when they should be encoding information into the type system to minimize future runtime checks.

Another good example of possible perf refactors is the use of splitAt in StgToCmm.Layout.direct_call. This might be ok, but that depends on the size of the lists. In any case the function also calls lengthLessThan on the its input list which again implies another data structure is needed here.

Lastly, I noticed mkCmmIfThen' in StgToCmm.Monad inputs a Maybe Bool, which is passed through mkCbranch to Cmm.Node.CmmCondBranch.cml_likely. I'm unsure what this is actually used for but this is quite a code smell to me. Not only is it boolean blind but implies that we are actually dealing with a ternary logic! In any case, not worth investigating right now; I just wanted to note it down.

Things left to do

  • Settle on naming convention
  • refactor isDllConApp?
  • Squash
Edited by jeffrey young

Merge request reports