Skip to content

improve basic block layout on LLVM backend by predicting stack/heap checks

Currently we don't give the LLVM optimizer any information about which branch of an if is likely to be taken. As a result, the optimizer is likely to produce a basic block layout which is not optimal. Improving the layout can improve performance through better instruction cache usage and better branch prediction by the hardware.

We can control LLVM's idea of what is likely with the llvm.expect intrinsic function. Some obvious branches which we can predict accurately are the stack and heap checks that appear near the entry of many functions.

Here's a small example of some Cmm code and the output of the LLVM optimizer/compiler.

 block_c2Lc_entry() //  [R1, R2]
         { info_tbl: [(c2Lc,
                       label: block_c2Lc_info
                       rep:StackRep [])]
           stack_info: arg_space: 0 updfr_space: Nothing
         }
     {offset
       c2Lc:
           Hp = Hp + 24;
           if (Hp > HpLim) goto c2Lm; else goto c2Ll;
       c2Lm:
           HpAlloc = 24;
           R2 = R2;
           R1 = R1;
           call stg_gc_pp(R2, R1) args: 8, res: 8, upd: 24;
       c2Ll:
           I64[Hp - 16] = :_con_info;
           P64[Hp - 8] = R1;
           P64[Hp] = R2;
           R1 = Hp - 14;
           Sp = Sp + 8;
           call (P64[Sp])(R1) args: 24, res: 0, upd: 24;
     }
 }]
00000000000002b8 <c2Lc_info>:
 2b8:   4c 89 e0                mov    %r12,%rax
 2bb:   4c 8d 60 18             lea    0x18(%rax),%r12
 2bf:   4d 3b a5 18 01 00 00    cmp    0x118(%r13),%r12
 2c6:   76 10                   jbe    2d8 <c2Lc_info+0x20>
 2c8:   49 c7 85 48 01 00 00    movq   $0x18,0x148(%r13)
 2cf:   18 00 00 00 
 2d3:   e9 00 00 00 00          jmpq   2d8 <c2Lc_info+0x20>
                        2d4: R_X86_64_PC32      stg_gc_pp+0xfffffffffffffffc
 2d8:   48 c7 40 08 00 00 00    movq   $0x0,0x8(%rax)
 2df:   00 
                        2dc: R_X86_64_32S       ghczmprim_GHCziTypes_ZC_con_info
 2e0:   48 89 58 10             mov    %rbx,0x10(%rax)
 2e4:   4c 89 70 18             mov    %r14,0x18(%rax)
 2e8:   48 8b 45 08             mov    0x8(%rbp),%rax
 2ec:   48 83 c5 08             add    $0x8,%rbp
 2f0:   49 8d 5c 24 f2          lea    -0xe(%r12),%rbx
 2f5:   ff e0                   jmpq   *%rax
 2f7:   90                      nop

It would likely be better to invert the branch at 2c6 to a ja, so that the common case is adjacent to the function entry, and when llvm.expect is used on the condition in the branch, the LLVM optimizer does produce this alternate layout.

Trac metadata
Trac field Value
Version 7.7
Type FeatureRequest
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler (LLVM)
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information