|
|
|
|
|
This page documents some cleanups that I (Sylvain Henry) would like to perform on GHC's code base.
|
|
|
|
|
|
## Why?
|
|
|
|
|
|
- Make the code more beginner friendly
|
|
|
|
|
|
- Avoid acronyms
|
|
|
- Hierarchical modules help in understanding the compiler structure
|
|
|
- Try to correctly name things:
|
|
|
|
|
|
- e.g. the "type checker" doesn't only check types, hence maybe we should call it "type system" or split it (e.g. Deriver, [TypeChecker](type-checker), etc.)
|
|
|
- Avoid meaningless codename (e.g. backpack, hoopl)
|
|
|
- Make the compiler more modular
|
|
|
|
|
|
- Allow easier reuse (with the GHC API)
|
|
|
- Make the compiler easier to debug
|
|
|
- Make adding new passes/optimisations easier
|
|
|
- Allow easier and faster testing (testing per component instead of testing the whole pipeline)
|
|
|
- Allow new more interactive frontends (step-run each compiler pass and show IR, stats, etc.)
|
|
|
- Allow profile guided optimizations (passes count and order, etc.)
|
|
|
|
|
|
## Step 1: introduce basic module hierarchy
|
|
|
|
|
|
|
|
|
Implement the [proposal for hierarchical module structure in GHC](module-dependencies/hierarchical) ([\#13009](https://gitlab.haskell.org//ghc/ghc/issues/13009)).
|
|
|
|
|
|
|
|
|
It consists only in renaming/moving modules.
|
|
|
|
|
|
|
|
|
Compared to the original proposal, I have:
|
|
|
|
|
|
- Put IRs into GHC.IR and compilers into GHC.Compiler
|
|
|
- changed GHC.Types into GHC.Data and GHC.Entity as the former is misleading (from a GHC API user point of view)
|
|
|
- split GHC.Typecheck into GHC.IR.Haskell.{[TypeChecker](type-checker),Deriver}
|
|
|
- split GHC.Utils into GHC.Utils and GHC.Data (e.g., Bag is in Data, not Utils)
|
|
|
- etc.
|
|
|
|
|
|
|
|
|
Tree logic:
|
|
|
|
|
|
- IR: intermediate representations. Each one contains its syntax and stuff manipulating it
|
|
|
|
|
|
- Haskell
|
|
|
|
|
|
- Syntax
|
|
|
- Parser, Lexer, Printer
|
|
|
- Analyser
|
|
|
- [TypeChecker](type-checker), Renamer, Deriver
|
|
|
- Core
|
|
|
|
|
|
- Syntax
|
|
|
- Analyser
|
|
|
- Transformer.{Simplifier,Specialiser,Vectoriser,WorkerWrapper,FloatIn,[FloatOut](float-out),CommonSubExpr, etc.}
|
|
|
- Cmm
|
|
|
|
|
|
- Syntax
|
|
|
- Analyser
|
|
|
- Parser, Lexer, Printer
|
|
|
- Transformer.{CommonBlockElim,ConstantFolder,Dataflow,ShortCutter,Sinker}
|
|
|
- Stg
|
|
|
|
|
|
- Syntax
|
|
|
- Analyser
|
|
|
- Transformer.{CommonSubExpr,CostCentreCollecter,Unariser}
|
|
|
- ByteCode.{Assembler,Linker...}
|
|
|
- Interface.{Loader,Renamer,[TypeChecker](type-checker), Transformer.Tidier}
|
|
|
- Llvm.{Syntax, Printer}
|
|
|
- Compiler: converters between representations
|
|
|
|
|
|
- HaskellToCore
|
|
|
- CoreToStg
|
|
|
- StgToCmm
|
|
|
- CmmToAsm
|
|
|
- CmmToLlvm
|
|
|
- CoreToByteCode
|
|
|
- CoreToInterface
|
|
|
- CmmToC
|
|
|
- TemplateToHaskell
|
|
|
- Entity: entities shared by different phases of the compiler (Class, Id, Name, Unique, etc.)
|
|
|
- Builtin: builtin stuff
|
|
|
|
|
|
- Primitive.{Types,Operations}: primitives
|
|
|
- Names, Types, Uniques: other wired-in stuff
|
|
|
- Program: GHC-the-program (command-line parser, etc.) and its modes
|
|
|
|
|
|
- Driver.{Phases,Pipeline}
|
|
|
- Backpack
|
|
|
- Make, MakeDepend
|
|
|
- Interactive: interactive stuff (debugger, closure inspection, interpreter, etc.)
|
|
|
- Data: data structures (Bag, Tree, etc.)
|
|
|
- Config: GHC configuration
|
|
|
|
|
|
- HostPlatform: host platform info
|
|
|
- Flags: dynamic configuration (DynFlags)
|
|
|
- Build: generated at build time
|
|
|
- Packages: package management stuff
|
|
|
- RTS: interaction with the runtime system (closure and table representation)
|
|
|
- Utils: utility code or code that doesn't easily belong to another directory (e.g., Outputable, SysTools, Elf, Finder, etc.)
|
|
|
- Plugin: modules to import to write compiler plugins
|
|
|
|
|
|
|
|
|
Actual renaming: see [CodeBaseCleanup/ModuleRenaming](code-base-cleanup/module-renaming)
|
|
|
|
|
|
|
|
|
Issues:
|
|
|
|
|
|
- name clashes: some modules in `base` (e.g. GHC.Desugar) and `ghc-prim` (e.g. GHC.Types) use the same GHC prefix
|
|
|
|
|
|
- maybe we should put all GHC extensions to base under GHC.Exts.\* or GHC.Base.\*
|
|
|
- use GHC.Builtin.Primitive.\* prefix in ghc-prim?
|
|
|
|
|
|
TODO in the future:
|
|
|
|
|
|
- Fix comments:
|
|
|
|
|
|
- Several references to Note "Remote Template Haskell" (supposedly in libraries/ghci/GHCi/TH.hs) but it doesn't exist. Maybe replaced by Note "Remote GHCi"?
|
|
|
- Undefined reference to "fill_in in PrelPack.hs" from GHC.Entity.Id
|
|
|
- Undefined reference to CgConTbls.hs from GHC.Compiler.StgToCmm.Binding
|
|
|
- Undefined reference to PprMach.hs from GHC.Compiler.CmmToAsm.PIC
|
|
|
- Undefined reference to Renaming.hs from GHC.IR.Core.Transformer.Substitution
|
|
|
- Undefined reference to simplStg/SRT.hs from GHC.IR.Cmm.Transformer.InfoTableBuilder
|
|
|
- Undefined reference to codeGen/CodeGen.hs from GHC.Compiler.HaskellToCore.Foreign.Declaration
|
|
|
- Undefined reference to RegArchBase.hs from GHC.Compiler.CmmToAsm.Register.Allocator.Graph.ArchX86
|
|
|
- Undefined reference to MachRegs\*.hs and MachRegs.hs from GHC.Compiler.CmmToAsm.Register.Allocator.Graph.ArchBase
|
|
|
- Binutils 2.17 is from 2011. Maybe we could remove the Hack in GHC.Compiler.CmmToAsm.X86.CodeGen
|
|
|
- Rename CAF into "static thunk"?
|
|
|
- put notes files (e.g. profiling-notes, \*.tex files) into actual notes or in the wiki
|
|
|
- Fix traces of RnHsSyn that doesn't exist anymore
|
|
|
- References to "NCG" should be replaced with reference to "CmmToAsm compiler"
|
|
|
- Foreign export stubs are generated in GHC.Compiler.HaskellToCore.Foreign.Declaration...
|
|
|
- Tests still reflect the old hierarchy (e.g., simplCore/should_compile) but renaming them could break other tools
|
|
|
|
|
|
|
|
|
Questions:
|
|
|
|
|
|
- Why don't we use the mangled selector name ($sel:foo:MkT) in every cases (not only when we have -XDuplicateRecordFields) instead of using the ambiguous one (foo)?
|
|
|
|
|
|
- Incidentally, partially answered yesterday (2017-06-12) on ticket [\#13352](https://gitlab.haskell.org//ghc/ghc/issues/13352)
|
|
|
|
|
|
## Step 2: split and edit some modules
|
|
|
|
|
|
|
|
|
Some modules contain a lot of (unrelated) stuff. We should split them.
|
|
|
|
|
|
- GHC.Utils (previously compiler/utils/Util.hs) contains a lot of stuff that should be split
|
|
|
|
|
|
- Compiler configuration (ghciSupported, etc.): GHC.Config
|
|
|
- List operations: GHC.Data.List{.Sort,.Fold}
|
|
|
- Transitive closure: GHC.Data.Graph?
|
|
|
- Edit distance and fuzzy match: GHC.Utils.FuzzyMatch?
|
|
|
- Shared globals between GHC package instances: GHC.Utils.SharedGlobals?
|
|
|
- Command-line parser: GHC.Utils.CmdLine
|
|
|
- exactLog2 (Integer): GHC.Data.Integer (why isn't it in base?)
|
|
|
- Read helpers (rational, maybe, etc.): GHC.Utils.Read?
|
|
|
- doesDirNameExist, getModificationUTCTime: GHC.Utils.FilePath
|
|
|
- hSetTranslit: GHC.Utils.Handle.Encoding
|
|
|
- etc.
|
|
|
- Split GHC.Types (was HscTypes) as it contains a lot of unrelated things
|
|
|
|
|
|
- ModGuts/ModDetails/ModIface: move to GHC.Data.Module.\*
|
|
|
- Usage/Dependencies: move to GHC.Data.Module.Usage/Dependencies
|
|
|
- GHC.Data.\*: split
|
|
|
|
|
|
- Split OccEnv from OccName (to harmonize with GHC.Data.Name.Env)?
|
|
|
- Split ModuleEnv/ModuleSet from Module?
|
|
|
- Split GHC.Data.Types (was TyCoRep)?
|
|
|
|
|
|
- Contains many data types (TyThing, Coercion, Type, Kind, etc.)
|
|
|
- Split PrettyPrint from GHC.Syntax.{Type,Expr,etc.}
|
|
|
- Split GHC.IR.Core.Transform.{Simplify,SimplUtils,etc.}
|
|
|
- Split GHC.Rename.ImportExport (e.g., contains "warnMissingSignature")
|
|
|
- Put cmmToCmm optimisations from GHC.Compilers.CmmToAsm into GHC.IR.Cmm.Transform
|
|
|
- Split type-checker solvers (class lookup, givens, wanted, etc.) (was TcSimplify, TcInteract, etc.)
|
|
|
- Module name GHC.Compilers.StgToCmm.Layout seems dubious: split and rename?
|
|
|
|
|
|
|
|
|
Some function/type names should be modified:
|
|
|
|
|
|
- Rename codeGen function into stgToCmm
|
|
|
- Rename nativeCodeGen into cmmToAsm
|
|
|
- Rename ORdList (in GHC.Data.Tree.OrdList) into TreeSomething? (misleading)
|
|
|
- CorePrep (prepare Core for codegen) could use a more explicit name
|
|
|
- Maybe rename GHC.Data.RepType
|
|
|
- Maybe rename OccName/RdrName/Name/Id to make them more explicit (may become obsolete with "trees that grow" patch)
|
|
|
|
|
|
- OccName: NSName (NameSpacedName)
|
|
|
- RdrName: ParsedName
|
|
|
- Name: UniqueName
|
|
|
- Id: TypedName
|
|
|
|
|
|
## Step 3: clearly separate GHC-the-program and GHC's API
|
|
|
|
|
|
- Make the GHC API purer
|
|
|
|
|
|
### Abstract file loading (i.e. pluggable Finder)
|
|
|
|
|
|
|
|
|
Currently the Finder assumes that a filesystem exists into which it can find some packages/modules.
|
|
|
|
|
|
|
|
|
I would like to add support for module sources that are only available in memory or that can be retrieved from elsewhere (network, etc.).
|
|
|
|
|
|
|
|
|
Something similar to Java's class loaders.
|
|
|
|
|
|
### Abstract error reporting and logging (i.e. pluggable Logger)
|
|
|
|
|
|
|
|
|
Allow new frontends (using GHC API) to use HTML reporting, etc.
|
|
|
|
|
|
- Avoid dumping to the filesystem and/or stdout/stderr
|
|
|
- Use data types instead of raw SDoc reports
|
|
|
|
|
|
### Step 4: clearly separate phases
|
|
|
|
|
|
- split DynFlags to only pass the required info to each pass
|
|
|
|
|
|
- e.g. only the required hooks
|
|
|
- use data types to report phase statistics, intermediate representations, etc. |