... | ... | @@ -51,6 +51,32 @@ For a decription of how to deal with overlapping register sets, which aren't ful |
|
|
|
|
|
For an overview of techniques for inserting spill code.
|
|
|
|
|
|
## Register pressure in Haskell code
|
|
|
|
|
|
|
|
|
Present GHC compiled code places very little pressure on the register set. Even on x86 with only 3 allocable registers, most modules do not need spill/reloads. This is a mixed blessing - on one hand the conflict graphs are small so we can avoid performance problems related to how the graph is represented, on the other hand it can be hard to find code to test against. Register pressure is expected to increase as the Stg-\>Cmm transform improves.
|
|
|
|
|
|
|
|
|
In the meantime, here are some good sources for test code:
|
|
|
|
|
|
- **Nofib**
|
|
|
|
|
|
Only a few nofib benchmarks create spills with `-O2`, two are `spectral/hartel/genfft` and `spectral/sorting`.
|
|
|
|
|
|
- **Turn on profiling.**
|
|
|
|
|
|
Register pressure increases significantly when the module is compiled with profiling. [checkSpills.report](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report) gives tuples of `(spills, reloads, reg-reg-moves)` present in output code generated by the three algorithms when compiled with `-O2 -prof`. Left to right are the stats for the linear, graph coloring and iterative coalescing algorithms. Note that most modules compile with no spill/reloads inserted, but a few (notably `real/compress2/Encode`) need several hundred.
|
|
|
|
|
|
> >
|
|
|
> > I've found it useful to maintain three darcs repos when working on the allocator. `ghc-HEAD-work` compiled with `-Onot` for fast compilation during hacking, `ghc-HEAD-prof` for testing with profiling turned on, and `ghc-HEAD-validate` for running the validate script. Patches are created in `work`, pushed into `prof` where `checkSpills` is used to compile the nofib benchmarks with the most register pressure. Once we're happy that the performance is ok, the patch is then pushed into `validate` for validation before pushing to the main repo on `darcs.haskell.org`
|
|
|
|
|
|
- **SHA from darcs**
|
|
|
|
|
|
The `SHA1.lhs` module from the darcs source, compiled with `-O2` creates the most register pressure out of any Haskell code that I'm aware of. When compiling SHA1, GHC inlines several worker functions and the native code block that computes the hash ends up being around 1700 instructions long. vregs that live in the middle of the block have in the order of 30 conflict neighbors. (evidently, the conflict graph is too large for most of the graphviz layout algorithms to cope with)
|
|
|
|
|
|
> >
|
|
|
> > For these reasons, `SHA1.lhs` can be treated as a good worst-case input to the allocator. In fact, the current linear allocator cannot compile it with `-O2 -prof` on x86 as it runs out of stack slots, which are allocated from a static pool. Make sure to test any changes to the allocator against this module.
|
|
|
|
|
|
## Hacking/Debugging
|
|
|
|
|
|
- **Turn on `-fasm-lint`**
|
... | ... | @@ -84,32 +110,6 @@ make EXTRA_HC_OPTS="-O2 -fregs-iterative -ddump-to-file -ddump-asm-regalloc-stag |
|
|
|
|
|
- checkSpills
|
|
|
|
|
|
## Register pressure in Haskell code
|
|
|
|
|
|
|
|
|
Present GHC compiled code places very little pressure on the register set. Even on x86 with only 3 allocable registers, most modules do not need spill/reloads. This is a mixed blessing - on one hand the conflict graphs are small so we can avoid performance problems related to how the graph is represented, on the other hand it can be hard to find code to test against. Register pressure is expected to increase as the Stg-\>Cmm transform improves.
|
|
|
|
|
|
|
|
|
In the meantime, here are some good sources for test code:
|
|
|
|
|
|
- **Nofib**
|
|
|
|
|
|
Only a few nofib benchmarks create spills with `-O2`, two are `spectral/hartel/genfft` and `spectral/sorting`.
|
|
|
|
|
|
- **Turn on profiling.**
|
|
|
|
|
|
Register pressure increases significantly when the module is compiled with profiling. [checkSpills.report](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report) gives tuples of `(spills, reloads, reg-reg-moves)` present in output code generated by the three algorithms when compiled with `-O2 -prof`. Left to right are the stats for the linear, graph coloring and iterative coalescing algorithms. Note that most modules compile with no spill/reloads inserted, but a few (notably `real/compress2/Encode`) need several hundred.
|
|
|
|
|
|
> >
|
|
|
> > I've found it useful to maintain three darcs repos when working on the allocator. `ghc-HEAD-work` compiled with `-Onot` for fast compilation during hacking, `ghc-HEAD-prof` for testing with profiling turned on, and `ghc-HEAD-validate` for running the validate script. Patches are created in `work`, pushed into `prof` where `checkSpills` is used to compile the nofib benchmarks with the most register pressure. Once we're happy that the performance is ok, the patch is then pushed into `validate` for validation before pushing to the main repo on `darcs.haskell.org`
|
|
|
|
|
|
- **SHA from darcs**
|
|
|
|
|
|
The `SHA1.lhs` module from the darcs source, compiled with `-O2` creates the most register pressure out of any Haskell code that I'm aware of. When compiling SHA1, GHC inlines several worker functions and the native code block that computes the hash ends up being around 1700 instructions long. vregs that live in the middle of the block have in the order of 30 conflict neighbors. (evidently, the conflict graph is too large for most of the graphviz layout algorithms to cope with)
|
|
|
|
|
|
> >
|
|
|
> > For these reasons, `SHA1.lhs` can be treated as a good worst-case input to the allocator. In fact, the current linear allocator cannot compile it with `-O2 -prof` on x86 as it runs out of stack slots, which are allocated from a static pool. Make sure to test any changes to the allocator against this module.
|
|
|
|
|
|
## Possible Improvements
|
|
|
|
|
|
|
... | ... | |