Decouple DynFlags in StgToCmm
What
Continuing to decouple DynFlags, see #20730 for project plan and status.
Add two modules:
- StgToCmm.Config
- Driver.Config.StgToCmm
Removes 3 types in StgToCmm
since they are now made redundant:
CgInfoDownwards
PtrOpts
CallOpts
Refactors call sites to DynFlags
in StgToCmm
with calls to StgToCmmConfig
or when the function does not have access to StgToCmmConfig
the MR breaks functions down such that their types only require what is used in the body of the function.
Turns out DynFlags
was pretty tightly coupled to this phase and so this MR touches quite a bit of code. Module
in StgToCmmConfig
is now the only reference to the module being compiled. That is, I decided this Module
is the _single source of truth_^{TM}. Previously there was a Module
passed around in function parameters and in DynFlags
. Thus several APIs have changed to enforce that design, which IMHO is a good thing.
Other things
You'll notice that UpdFrameOffset
and Module
are lazy in StgToCmmConfig
. This is purposeful, if Module
is strict the Cmm/Parser.y
throws a panic, see line 1522 (the call to runC
). This can be changed from no_module
to this_mod
but that breaks the reg_alloc_unit_tests
and I didn't want to delay the MR to try to understand that test so I decided against making Module
strict.
UpdFrameOffset
only stores an Int
but if that is made strict then the RTS slows noticeably down and deadlocks at StgToThnk.cmm
. I did not investigate why this occurs opting instead to leave it lazy but with a comment.
I keep Sequel
and SelfLoopInfo
in StgToCmm/Config.hs
module. This is also purposeful. Ideally these types would be placed in StgToCmm/Types.hs
but doing so creates a gnarly cyclic dependency between Builtins
and Cmm/Info.hs
so I've opted against that. We could move this into its own module, but then that module is quite peculiar and doesn't conform to the architecture conventions of the rest of the compiler, i.e., the new module would only export the type and no actual functions for working with the type, because these functions are in StgToCmm/Monad.hs
since they are monadic. I think this is best left to a future refactor that untangles the module dependencies in this phase (good newcomer issue!).
Lastly, I'm not keen on the naming convention I've used because it is long stgToCmm<foo>
, perhaps stc<foo>
would be better? I'd like a second opinion on this.
Final and smelly things
Note to my future self, when I (or someone) begins phase 3 of the project plan stated in #20730 (comment 397931): There is a great deal of opportunity for performance minded refactoring. For example, there are several places where we have runtime checks via guards and then that information is not passed downward into subsequent code. A good example of this is isDllConApp
and isDynLinkName
where both functions check the platform for OSMinGW32
:
-- | Does this constructor application refer to anything in a different
-- *Windows* DLL?
-- If so, we can't allocate it statically
isDllConApp
:: Platform
-> Bool -- is Opt_ExternalDynamicRefs enabled?
-> Module
-> DataCon
-> [StgArg]
-> Bool
isDllConApp platform ext_dyn_refs this_mod con args
| not ext_dyn_refs = False
| platformOS platform == OSMinGW32
= isDynLinkName platform this_mod (dataConName con) || any is_dll_arg args
| otherwise = False
where
-- NB: typePrimRep1 is legit because any free variables won't have
-- unlifted type (there are no unlifted things at top level)
is_dll_arg :: StgArg -> Bool
is_dll_arg (StgVarArg v) = isAddrRep (typePrimRep1 (idType v))
&& isDynLinkName platform this_mod (idName v)
is_dll_arg _ = False
Notice that isDynLinkName
is called only when platformOS platform
is OSMinGW32
. Now isDynLinkName
is:
isDynLinkName :: Platform -> Module -> Name -> Bool
isDynLinkName platform this_mod name
| Just mod <- nameModule_maybe name
...
= case platformOS platform of
...
OSMinGW32 -> moduleUnit mod /= moduleUnit this_mod
-- For the other platforms, still perform the hack
_ -> mod /= this_mod
| otherwise = False -- no, it is not even an external name
Which checks again even though this is already guaranteed. Note that isDynLinkName
is used in only one other place: GHC.Cmm.CLabel.labelDynamic
so we could remove/refactor it here if we want. But the example still stands; these kinds of checks are, in general, forgotten, when they should be encoding information into the type system to minimize future runtime checks.
Another good example of possible perf refactors is the use of splitAt
in StgToCmm.Layout.direct_call
. This might be ok, but that depends on the size of the lists. In any case the function also calls lengthLessThan
on the its input list which again implies another data structure is needed here.
Lastly, I noticed mkCmmIfThen'
in StgToCmm.Monad
inputs a Maybe Bool
, which is passed through mkCbranch
to Cmm.Node.CmmCondBranch.cml_likely
. I'm unsure what this is actually used for but this is quite a code smell to me. Not only is it boolean blind but implies that we are actually dealing with a ternary logic! In any case, not worth investigating right now; I just wanted to note it down.
Things left to do
-
Settle on naming convention -
refactor isDllConApp
? -
Squash