More dense jump table encoding
While looking at the code generation changes necessary for #21019 (closed) I noticed that gcc
and clang
both exploit an optimisation in their treatment of PIC jump tables that we currently don't. Specifically, consider a program like:
int extract_b(int a, int b, int c) {
switch (a) {
case 0: return b*c;
case 1: return b+c;
case 2: return b*c;
case 3: return a+b;
case 4: return a+2;
case 5: return b;
}
}
gcc
will produce a jump table consisting of .long
offsets:
.text
.p2align 4
.globl extract_b
.type extract_b, @function
extract_b:
.cfi_startproc
cmpl $5, %edi
ja .L2
leaq .L4(%rip), %rcx
movl %edi, %edi
movslq (%rcx,%rdi,4), %rax
addq %rcx, %rax
jmp *%rax
.section .rodata
.align 4
.align 4
.L4:
.long .L7-.L4
.long .L8-.L4
.long .L7-.L4
.long .L6-.L4
.long .L5-.L4
.long .L9-.L4
By contrast, GHC produces a table of jump table of .quad
s. The former will be slightly cache-efficient and therefore may be slightly faster in tight loops.