Register spilling produces ineffecient/highly contending code
The native codegen and llvm both produce ineffecient code for functions using structures with many strict fields or unboxed values.
Consider the following program:
{-# LANGUAGE BangPatterns #-}
module Spill where
import GHC.Exts
data S = S !Int !Int !Int !Int !Int !Int !Int !Int !Int
spill :: S -> S -> S -> S
spill (S !a !b !c !d !e !f !g !h !i) (S !j !k !l !m !n !o !p !q !r) (S !s !t !u !v !w !x !y !z _)
= S (a + j + s) (b + c) (k + r) (a + b + c + d + e + f + g + h + i) (j + k + l + m + n + o + p + q + r) (s + t + u + v + w + x + y + z) (a + b + c) (j + k + l) (s + t + u)
Parts of the code produced for this (which is identical regardless of -funbox-strict-fields) looks like:
_cnc:
addq $80,%r12
cmpq 144(%r13),%r12
ja _cni
movq $Spill.S_con_info,-72(%r12)
movq 32(%rbp),%rax
movq %rax,-64(%r12)
movq 24(%rbp),%rax
movq %rax,-56(%r12)
movq 16(%rbp),%rax
movq %rax,-48(%r12)
movq 8(%rbp),%rax
movq %rax,-40(%r12)
movq 40(%rbp),%rax
movq %rax,-32(%r12)
movq 48(%rbp),%rax
movq %rax,-24(%r12)
movq 56(%rbp),%rax
movq %rax,-16(%r12)
movq 64(%rbp),%rax
movq %rax,-8(%r12)
movq 7(%rbx),%rax
movq %rax,0(%r12)
leaq -71(%r12),%rbx
addq $72,%rbp
jmp *0(%rbp)
_csv:
movq 63(%rbx),%rax
movq %rax,-56(%rbp)
movq 55(%rbx),%rax
movq %rax,-48(%rbp)
movq 47(%rbx),%rax
movq %rax,-40(%rbp)
movq 39(%rbx),%rax
movq %rax,-32(%rbp)
movq 31(%rbx),%rax
movq %rax,-24(%rbp)
movq 23(%rbx),%rax
movq %rax,-16(%rbp)
movq 71(%rbx),%rax
movq %rax,-8(%rbp)
movq 15(%rbx),%rax
movq %rax,0(%rbp)
And likewise for LLVM:
.LBB10_1: # %coZ
movq 7(%rbx), %rcx
movq $Spill_S_con_info, 8(%rax)
movq 8(%rbp), %rdx
movq %rdx, 16(%rax)
movq 16(%rbp), %rdx
movq %rdx, 24(%rax)
movq 24(%rbp), %rdx
movq %rdx, 32(%rax)
movq 32(%rbp), %rdx
movq %rdx, 40(%rax)
movq 40(%rbp), %rdx
movq %rdx, 48(%rax)
movq 48(%rbp), %rdx
movq %rdx, 56(%rax)
movq 56(%rbp), %rdx
movq %rdx, 64(%rax)
movq 64(%rbp), %rdx
movq %rdx, 72(%rax)
movq %rcx, (%r12)
movq 72(%rbp), %rax
leaq 72(%rbp), %rbp
leaq -71(%r12), %rbx
jmpq *%rax # TAILCALL
Quoting from #ghc "the [register allocator] core algo is '96 vintage". Improvements are needed;
- Take into consideration pipelining and handle spills less dramatically, attempting to reduce register contention
- Sink memory reads in order to reduce register pressure
Trac metadata
Trac field | Value |
---|---|
Version | 7.6.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |