WIP: A step towards a working fix #25636
Introduction
This MR is a work in progress to fix issue #25636.
Looking at the MR from a high level, it seems that the MR soon have resolved the problem. Unfortunately, it has not resolved issue #25636 and therefore I must have missed some smaller technical detail in the long process of extending the bytecode interpreter. I need some assistance via code review in locating what is missing in the MR to complete this patch.
Reproduction
Build the branch then run the following:
cd testsuite/tests/codeGen/should_run/T23146
../../../../../_build/stage1/bin/ghc --interactive T23146_liftedeq.hs
ghci > main
It used to throw a compiler panic. Then I got it to only throw a linker error:
GHC.ByteCode.Linker.lookupCE
During interactive linking, GHCi couldn't find the following symbol:
closure:$WUNil
This may be due to you not asking GHCi to load extra object files,
archives or DLLs needed by your current session. Restart GHCi, specifying
the missing library using the -L/path/to/object/dir and -lmissinglibname
flags, or simply by naming the relevant files on the GHCi command line.
Alternatively, this link failure might indicate a bug in GHCi.
If you suspect the latter, please report this as a GHC bug:
https://www.haskell.org/ghc/reportabug
Currently calling main appears to loop indefinitely.
The Patch
We require ensuring that unlifted data types are strictly evaluated in the bytecode interpreter.
Due to "kind erasure" when converting from STG to ByteCode, we cannot simply query the ByteCode Object (BCO) for it's kind levity and branch accordingly into strict or lazy evaluation.
Instead, we must tag the BCO in some way during the STG to ByteCode transformation so that it can later be correctly evaluated by the interpreter.
Here is the workflow:
-
We create a new logical type for the interpreter, an Unlifted Data Constructor (UDC). This UDC can be either 'unlinked' or 'resolved' (similar to BCOs).
-
All UDCs begin as an unlinked UDC, represented by the data-type
UnlinkedUDCdefined incompiler/GHC/ByteCode/Types.hs. This data-type stores the binding'sNameso that it can be correctly referenced and also stores theConInfoTablefo that the unlifted value can be dynamically constructed at runtime. TheConInfoTabledata-type and type-class instances have been moved fromlibraries/ghci/GHCi/Message.hsto ``libraries/ghci/GHCi/ResolvedBCO.hs` to avoid module import dependencies. -
When converting the STG to ByteCode, within the
schemeTopBindfunction defined incompiler/GHC/StgToByteCode.hs, the compiler tests the kind of the object while the kind information is still in scope (before "kind erasure"). -
The function
schemeTopBindis called bybyteCodeGenalso defined incompiler/GHC/StgToByteCode.hs. The results of theschemeTopBindnow contain both UDCs and BCOs. These types are partitioned into twoFlatBags. -
The function
byteCodeGenthen callsassembleBCOsdefined incompiler/GHC/ByteCode/Asm.hs, with bothFlatBags of UDC and of BCOs (where before htere were only BCOs). -
The function
assembleBCOsdutifully adds theFlatBagof UDCs to theCompiledByteCodedata-type definedcompiler/GHC/ByteCode/Types.hs. -
The
CompiledByteCodedata-type is eventually consumed by thelinkSomeBCOsfunction defined incompiler/GHC/Linker/Loader.hs. Here the BCOs are converted fromUnlinkedBCOs toResolvedBCOs by recursively and lazily instantiating all pointers/references contained in the collection of BCOs. This same function now needs to also strictly instantiate the UCDs and have the BCOs correctly point to them. -
The reference resolution of the UDCs/BCSs of the
CompiledByteCodehappens in thelinkBCOfunction, defined incompiler/GHC/ByteCode/Linker.hs. A sub-call toresolvePtrdefined in the same module now takes both name references to BCOs and to UDCs and conditionally createsResolvedBCOReforResolvedBCORefUnliftedvalues, respectively. The result of this process is aResolvedBCO. -
The result of
linkSomeBCOsafter callinglinkBCOon all supplied BCOs, is a mapping ofNames toHValues. -
The mapping is then used by
dynLinkBCOsdefined incompiler/GHC/Linker/Loader.hsto add the BCOs to theclosure_envof theLoaderState. The functiondynLinkBCOsis transitively used by the top level bindsloadDeclsandloadModuleof the same module. -
The GHCi interpreter interface is extended with a new operation
CreateUDCsdefined inlibraries/ghci/GHCi/Message.hs. This is used for instantiating the UDCs. -
Naturally the GHCi interpreter evaluator defined in
libraries/ghci/GHCi/Run.hsis also extended to take aCreateUDCsvalue and call thecreateUDCsfunction defined inlibraries/ghci/GHCi/CreateBCO.hs. -
The
createUDCsfunction transitively calls the new prim-opnewUDC#in order to instantiate the UDC value. -
The
newUDC#prim-op is defined incompiler/GHC/Builtin/primops.txt.ppby theNewUDCOpoperator. This prim-op *should be associated with the C functionstg_newUDHzhdefined inrts/PrimOps.cmm
Somewhere, something isn't quite lining up and the bytecode interpreter is still not correctly evaluation unlifted data constructors.