... | ... | @@ -12,15 +12,13 @@ GHC currently provides three register allocation algorithms, one which does simp |
|
|
|
|
|
The linear allocator is turned on by default. This is what you get when you compile with `-fasm`. The linear allocator does a single pass through the code, allocating registers on a first-come-first-served basis. It is quick, and does a reasonable job for code with little register pressure.
|
|
|
|
|
|
> >
|
|
|
> > This algorithm has no look-ahead. If say, a particular hreg will be clobbered by a function call, it does not know to avoid allocating to it in the code before the call, and subsequently inserts more spill/reload instructions than strictly needed.
|
|
|
* This algorithm has no look-ahead. If say, a particular hreg will be clobbered by a function call, it does not know to avoid allocating to it in the code before the call, and subsequently inserts more spill/reload instructions than strictly needed.
|
|
|
|
|
|
- **Graph coloring** (enabled with `-fregs-graph`)
|
|
|
|
|
|
The graph coloring algorithm operates on the code for a whole function at a time. From each function it extracts a register conflict graph which has a node for every vreg and an edge between two vregs if they are in use at the same time and thus cannot share the same hreg. The algorithm tries to assign hregs (imagined as colors) to the nodes so that no two adjacent nodes share the same color, if it can't then it inserts spill code, rebuilds the graph and tries again.
|
|
|
|
|
|
> >
|
|
|
> > Graph coloring tends to do better than the linear allocator because the conflict graph helps it avoid the look-ahead problem. The coloring allocator also tries harder to allocate the source and destination of reg-to-reg move instructions to the same hreg. This is done by coalescing (merging) move-related nodes. If this succeeds then the associated moves can be erased.
|
|
|
* Graph coloring tends to do better than the linear allocator because the conflict graph helps it avoid the look-ahead problem. The coloring allocator also tries harder to allocate the source and destination of reg-to-reg move instructions to the same hreg. This is done by coalescing (merging) move-related nodes. If this succeeds then the associated moves can be erased.
|
|
|
|
|
|
- **Graph coloring with iterative coalescing** (enabled with `-fregs-iterative`)
|
|
|
|
... | ... | @@ -70,15 +68,13 @@ In the meantime, here are some good sources for test code: |
|
|
|
|
|
Register pressure increases significantly when the module is compiled with profiling. [checkSpills.report](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report) gives tuples of `(spills, reloads, reg-reg-moves)` present in output code generated by the three algorithms when compiled with `-O2 -prof`. Left to right are the stats for the linear, graph coloring and iterative coalescing algorithms. Note that most modules compile with no spill/reloads inserted, but a few (notably `real/compress2/Encode`) need several hundred.
|
|
|
|
|
|
> >
|
|
|
> > I've found it useful to maintain three darcs repos when working on the allocator. `ghc-HEAD-work` compiled with `-Onot` for fast compilation during hacking, `ghc-HEAD-prof` for testing with profiling turned on, and `ghc-HEAD-validate` for running the validate script. Patches are created in `work`, pushed into `prof` where `checkSpills` is used to compile the nofib benchmarks with the most register pressure. Once we're happy that the performance is ok, the patch is then pushed into `validate` for validation before pushing to the main repo on `darcs.haskell.org`
|
|
|
* I've found it useful to maintain three darcs repos when working on the allocator. `ghc-HEAD-work` compiled with `-Onot` for fast compilation during hacking, `ghc-HEAD-prof` for testing with profiling turned on, and `ghc-HEAD-validate` for running the validate script. Patches are created in `work`, pushed into `prof` where `checkSpills` is used to compile the nofib benchmarks with the most register pressure. Once we're happy that the performance is ok, the patch is then pushed into `validate` for validation before pushing to the main repo on `darcs.haskell.org`
|
|
|
|
|
|
- **SHA from darcs**
|
|
|
|
|
|
The `SHA1.lhs` module from the darcs source, compiled with `-O2` creates the most register pressure out of any Haskell code that I'm aware of. When compiling SHA1, GHC inlines several worker functions and the native code block that computes the hash ends up being around 1700 instructions long. vregs that live in the middle of the block have in the order of 30 conflict neighbors. (evidently, the conflict graph is too large for most of the graphviz layout algorithms to cope with)
|
|
|
|
|
|
> >
|
|
|
> > For these reasons, `SHA1.lhs` can be treated as a good worst-case input to the allocator. In fact, the current linear allocator cannot compile it with `-O2 -prof` on x86 as it runs out of stack slots, which are allocated from a static pool. Make sure to test any changes to the allocator against this module.
|
|
|
* For these reasons, `SHA1.lhs` can be treated as a good worst-case input to the allocator. In fact, the current linear allocator cannot compile it with `-O2 -prof` on x86 as it runs out of stack slots, which are allocated from a static pool. Make sure to test any changes to the allocator against this module.
|
|
|
|
|
|
## Hacking/Debugging
|
|
|
|
... | ... | @@ -88,21 +84,21 @@ In the meantime, here are some good sources for test code: |
|
|
|
|
|
- **Some useful dump flags**
|
|
|
|
|
|
> > `-ddump-asm-regalloc-stages`
|
|
|
> >
|
|
|
> > Shows the code and conflict graph after ever spill/color stage. Also shows spill costs, and what registers were coalesced.
|
|
|
* `-ddump-asm-regalloc-stages`
|
|
|
|
|
|
> > `-ddump-asm-stats`
|
|
|
> >
|
|
|
> > Gives statistics about how many spills/reloads/reg-reg-moves are in the output program.
|
|
|
Shows the code and conflict graph after ever spill/color stage. Also shows spill costs, and what registers were coalesced.
|
|
|
|
|
|
> > `-ddump-asm`
|
|
|
> >
|
|
|
> > Gives the final output code.
|
|
|
* `-ddump-asm-stats`
|
|
|
|
|
|
> > `-ddump-to-file`
|
|
|
> >
|
|
|
> > Diverts dump output to files. This can be used to get dumps from each module in a nofib benchmark.
|
|
|
Gives statistics about how many spills/reloads/reg-reg-moves are in the output program.
|
|
|
|
|
|
* `-ddump-asm`
|
|
|
|
|
|
Gives the final output code.
|
|
|
|
|
|
* `-ddump-to-file`
|
|
|
|
|
|
Diverts dump output to files. This can be used to get dumps from each module in a nofib benchmark.
|
|
|
|
|
|
```wiki
|
|
|
cd nofib/real/anna
|
... | ... | @@ -117,12 +113,11 @@ make EXTRA_HC_OPTS="-O2 -fregs-iterative -ddump-to-file -ddump-asm-regalloc-stag |
|
|
circo -Tpng niceGraph.dot -o niceGraph.png
|
|
|
```
|
|
|
|
|
|
> >
|
|
|
> > Here's two from `nofib/real/compress2/Encode` compiled with `-O2 -prof`:
|
|
|
* Here's two from `nofib/real/compress2/Encode` compiled with `-O2 -prof`:
|
|
|
|
|
|
> > > [graph.dot](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.dot)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.dot) -\> [graph.png](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.png)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.png)
|
|
|
* [graph.dot](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.dot)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.dot) -\> [graph.png](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.png)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph.png)
|
|
|
|
|
|
> > > [graph-colored.dot](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.dot)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.dot) -\> [graph-colored.png](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.png)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.png)
|
|
|
* [graph-colored.dot](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.dot)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.dot) -\> [graph-colored.png](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.png)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/graph-colored.png)
|
|
|
|
|
|
- **checkSpills**
|
|
|
[checkSpills.hs](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.hs)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.hs) is a nasty, throw away script which can be used to automate the comparison of allocation algorithms. Copy it and a list of test like [checkSpills.tests](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.tests)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.tests) to the top level nofib directory, compile and run. It will build the nofib benchmarks in the list 6 times each, once each with each of the allocators to extract spill counts, and then once again to get compile timings which are unperterbed by the space leaks introduced by compiling with debugging turned on. It's only needed if you're hacking on the allocator, parses the nofib make output directly, and is likely to rot - which is why it isn't included in the main source tree.
|
... | ... | @@ -163,5 +158,4 @@ These are some ideas for improving the current allocator, most potentially usefu |
|
|
|
|
|
For the architectures currently supported, x86, x86_64 and ppc, the native code generator currently emits code using only two register classes `RcInteger` and `RcDouble`. As these classes are disjoint (ie, none of the regs from one class alias with with regs from another), checking whether a node of a certain class is trivially colorable reduces to counting up the number of neighbours of that class.
|
|
|
|
|
|
> >
|
|
|
> > If the NCG starts to use aliasing register classes eg: both 32bit `RcFloat`s and 64bit `RcDouble`s on sparc; combinations of 8, 16, and 32 bit integers on x86 / x86_x6 or usage of sse / altivec regs in different modes, then this can be supported via the method described in \[Smith et al\]. The allocator was designed with this in mind - ie, by passing a function to test if a node is trivially colorable as a parameter to the coloring function - and there is already a description of the register set for x86 in [compiler/nativeGen/RegArchX86.hs](/ghc/ghc/tree/master/ghc/compiler/nativeGen/RegArchX86.hs), but the native code generator doesn't currently emit code to test it against. |
|
|
* If the NCG starts to use aliasing register classes eg: both 32bit `RcFloat`s and 64bit `RcDouble`s on sparc; combinations of 8, 16, and 32 bit integers on x86 / x86_x6 or usage of sse / altivec regs in different modes, then this can be supported via the method described in \[Smith et al\]. The allocator was designed with this in mind - ie, by passing a function to test if a node is trivially colorable as a parameter to the coloring function - and there is already a description of the register set for x86 in [compiler/nativeGen/RegArchX86.hs](/ghc/ghc/tree/master/ghc/compiler/nativeGen/RegArchX86.hs), but the native code generator doesn't currently emit code to test it against. |