Skip to content

WIP: AArch64 NCG

Moritz Angermann requested to merge wip/angerman/aarch64-ncg into master
  • Basic NCG that can compile trivial Haskell programs.
  • Add Enum Set to NatM with optimisation flags, and the corresponding -fasm-opt-<flag>. These should eventually all be on by default, but having separate flags will allow us to measure performance impacts precisely and write tests that verify that certain optimisations actually happen.
    • jumptbl: Generate Jump Tables
    • regoff: Destructure Reg Offsets
    • zeroreg: Return wzr, xzr for 0 loads, instead of assigning them to registers.
    • immload: Try to load immediates in as few as possible instructions.
    • ...
  • Add ANN SDoc Inst, that prints as ppr $inst <pad to 80># $comment, to allow adding inline comments tot he assembly for better readability. The COMMENT pseudo instruction isn't very good for that.
  • Add fuse phase. This would need to run after the the register allocator, and turn subsequent LDR/STR instructions into LDP/STP instructions. Basically an OL -> OL fold.
    A fuse phase would allow more general transformations. However for the LDR/STR situation having a mass-spill, mass-reload hook could also work.
  • Add spill/reload counter/statistics. On large register machines, modules that produce high spills/reloads would provide good points for investigation. Why do we need to spill so much? Can we optimise the register usage, assignment, instruction order?
  • Drop unused basic block labels.
  • Build an llvm and a ncg version of GHC, with -keep-s-file, then compute a statistic of [module] [llvm generated instruction] [ncg generated instructions], check the ratio and investigate modules where the ncg ends up producing a lot more instructions. Also look at instruction distributions. Does the llvm backend generate interesting alternative instructions that the ncg should learn about?
  • PIC/no-PIC? Fixed, but requires !3433 (closed) for full linker support.
    This is a bit tricky. We use pc relative loads with ranges +-4GB, this is essentially what -fPIC will produce by default. However we might run into issues if we try to link code with the rts linker if that wasn't built with -fPIC. E.g. Haskell Module (essentially always PIC) -> C Module (no -fPIC) -> symbol. The C Module might reference something that's out of reach (e.g. environ, stdout, ...), and the rts linker has no way of relocating those symbols correctly.
  • Research: Is the linear register allocator the best suited one?
    See Andreas comment below.
  • Replace FileCheck with some tool, such that we do not necessary depend on LLVM's toolchain just for FileCheck.
  • Integrate the cmm to asm test-suite in tests into to the ghc testsuite somehow?
Edited by Moritz Angermann

Merge request reports