... | ... | @@ -63,7 +63,7 @@ In the meantime, here are some good sources for test code: |
|
|
|
|
|
Only a few nofib benchmarks create spills with `-O2`, two are `spectral/hartel/genfft` and `spectral/sorting`.
|
|
|
|
|
|
- **Turn on profiling.**
|
|
|
- **Turn on profiling**
|
|
|
|
|
|
Register pressure increases significantly when the module is compiled with profiling. [checkSpills.report](/trac/ghc/attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report)[](/trac/ghc/raw-attachment/wiki/Commentary/Compiler/Backends/NCG/RegisterAllocator/checkSpills.report) gives tuples of `(spills, reloads, reg-reg-moves)` present in output code generated by the three algorithms when compiled with `-O2 -prof`. Left to right are the stats for the linear, graph coloring and iterative coalescing algorithms. Note that most modules compile with no spill/reloads inserted, but a few (notably `real/compress2/Encode`) need several hundred.
|
|
|
|
... | ... | @@ -113,11 +113,18 @@ make EXTRA_HC_OPTS="-O2 -fregs-iterative -ddump-to-file -ddump-asm-regalloc-stag |
|
|
## Possible Improvements
|
|
|
|
|
|
|
|
|
These are some ideas for improving the current allocator, most usful first.
|
|
|
These are some ideas for improving the current allocator, most potentially useful first.
|
|
|
|
|
|
- Work lists for iterative coalescing.
|
|
|
- **Work lists for iterative coalescing.**
|
|
|
|
|
|
- Improve spill code cleaner.
|
|
|
The iterative coalescing alternates between scanning the graph for trivially colorable (triv) nodes and perforing coalescing. When two nodes are coalesced, other nodes that are not adjacent to the coalesced nodes do not change and do not need to be rescanned straight away. Runtime performance of the iterative coalescer could probably be improved by keeping a work-list of "nodes that might have become trivially colorable", to help rescanning nodes that won't have changed.
|
|
|
|
|
|
- **Improve spill code generator/cleaner.**
|
|
|
|
|
|
When spilling a particular vreg, the current spill code generator simply inserts a spill after each def and a reload before each use. This quickly reduces the density of conflicts in the graph, but produces inefficient code because more spill/reloads are inserted than strictly nessesary. Good code is recovered by the spill cleaner which runs after allocation and removes spill/reload instructions that aren't nessesary. Some things to try:
|
|
|
|
|
|
- **Spill coalescing**. No attempt is currently made to share spill slots between different vregs. Each named vreg is spilled to its own static spill slot on the C stack. The amount of stack space needed could be reduced by sharing spill slots between vregs so long as their live ranges do not overlap.
|
|
|
- **Try to split live ranges before spilling**. If a live range has several use/defs then we could insert fresh reg-reg moves to break it up into several smaller live ranges. We then might get away with spilling just one section instead of the whole range.
|
|
|
|
|
|
- Revisit choosing of spill candidates.
|
|
|
|
... | ... | |