improve basic block layout on LLVM backend by predicting stack/heap checks
Currently we don't give the LLVM optimizer any information about which branch of an if
is likely to be taken. As a result, the optimizer is likely to produce a basic block layout which is not optimal. Improving the layout can improve performance through better instruction cache usage and better branch prediction by the hardware.
We can control LLVM's idea of what is likely with the llvm.expect
intrinsic function. Some obvious branches which we can predict accurately are the stack and heap checks that appear near the entry of many functions.
Here's a small example of some Cmm code and the output of the LLVM optimizer/compiler.
block_c2Lc_entry() // [R1, R2]
{ info_tbl: [(c2Lc,
label: block_c2Lc_info
rep:StackRep [])]
stack_info: arg_space: 0 updfr_space: Nothing
}
{offset
c2Lc:
Hp = Hp + 24;
if (Hp > HpLim) goto c2Lm; else goto c2Ll;
c2Lm:
HpAlloc = 24;
R2 = R2;
R1 = R1;
call stg_gc_pp(R2, R1) args: 8, res: 8, upd: 24;
c2Ll:
I64[Hp - 16] = :_con_info;
P64[Hp - 8] = R1;
P64[Hp] = R2;
R1 = Hp - 14;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 24, res: 0, upd: 24;
}
}]
00000000000002b8 <c2Lc_info>:
2b8: 4c 89 e0 mov %r12,%rax
2bb: 4c 8d 60 18 lea 0x18(%rax),%r12
2bf: 4d 3b a5 18 01 00 00 cmp 0x118(%r13),%r12
2c6: 76 10 jbe 2d8 <c2Lc_info+0x20>
2c8: 49 c7 85 48 01 00 00 movq $0x18,0x148(%r13)
2cf: 18 00 00 00
2d3: e9 00 00 00 00 jmpq 2d8 <c2Lc_info+0x20>
2d4: R_X86_64_PC32 stg_gc_pp+0xfffffffffffffffc
2d8: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax)
2df: 00
2dc: R_X86_64_32S ghczmprim_GHCziTypes_ZC_con_info
2e0: 48 89 58 10 mov %rbx,0x10(%rax)
2e4: 4c 89 70 18 mov %r14,0x18(%rax)
2e8: 48 8b 45 08 mov 0x8(%rbp),%rax
2ec: 48 83 c5 08 add $0x8,%rbp
2f0: 49 8d 5c 24 f2 lea -0xe(%r12),%rbx
2f5: ff e0 jmpq *%rax
2f7: 90 nop
It would likely be better to invert the branch at 2c6
to a ja
, so that the common case is adjacent to the function entry, and when llvm.expect
is used on the condition in the branch, the LLVM optimizer does produce this alternate layout.
Trac metadata
Trac field | Value |
---|---|
Version | 7.7 |
Type | FeatureRequest |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler (LLVM) |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |