Optimization idea for case expression dispatch
Motivation
Some C compilers implement an optimization that ghc could benefit from. A case statement like the following (where we have many different cases returning the same expression), is currently implemented in ghc as a series of compare and jump statements following the balanced binary tree based on the ints in the case statement (depending on how dense the range is ghc is smart enough to combine the binary tree with an array lookup part which is great). (In the example below the ints in question are not considered dense so we get a pure tree.)
opt :: Int -> Int
opt 0 = 10
opt 1 = 10
opt 2 = 10
opt 3 = 10
opt 23 = 10
opt 44 = 10
opt 60 = 10
opt 61 = 10
opt _ = 100
the ghc generated assembly is basically the tree search:
Opt.$wopt_info:
_c7is:
cmpq $45,%r14
jge _u7iB
_u7iv:
cmpq $4,%r14
jge _u7iz
_u7iw:
cmpq $3,%r14
jge _c7ik
_u7ix:
testq %r14,%r14
jge _c7ik
_c7ij:
movl $100,%ebx
jmp *(%rbp)
_u7iz:
cmpq $44,%r14
jge _c7ik
_u7iA:
cmpq $23,%r14
jne _c7ij
_c7ik:
movl $10,%ebx
jmp *(%rbp)
_u7iB:
cmpq $61,%r14
jge _u7iD
_u7iC:
cmpq $60,%r14
jl _c7ij
jmp _c7ik
_u7iD:
cmpq $62,%r14
jl _c7ik
jmp _c7ij
.size Opt.$wopt_info, .-Opt.$wopt_info
Proposal
Here is the same code in C:
int opt(long c) {
switch (c) {
case 0: return 10;
case 1: return 10;
case 2: return 10;
case 3: return 10;
case 23: return 10;
case 44: return 10;
case 60: return 10;
case 61: return 10;
default: return 100;
}
}
and here is how a clever C compiler implements it:
static const long magicConstant = 0b1111000000000000000000010000000000000000000010000000000000001100;
int opt2(long c) {
if (((unsigned long) c) > 61) // negative numbers also taken care of here
return 100; // by the cast to unsigned (as unsigned they are big!)
if (((magicConstant << c) < 0))
return 10;
else
return 100;
}
This results in the much shorter assembly below:
opt2:
.LFB24:
.cfi_startproc
endbr64
movl $100, %eax
cmpq $61, %rdi
ja .L6
movabsq $-1152920405094694900, %rax
movl %edi, %ecx
salq %cl, %rax
sarq $63, %rax
andl $-90, %eax
addl $100, %eax
.L6:
ret
The number of instructions executed is slightly less here (and always constant independent of the number of cases), but more importantly this is straight line code and would easily beat the binary tree jumpy code.
The idea is to construct a magic constant that when shifted left by c
(our input) will cause the result to go negative (the most significant bit becomes 1) if and only if c
equals one of the integers in the case statement. This optimization can only be done if the integers we are comparing against are small (in the region [0,63] -- the number of bits in one machine word).
This optimization may be applicable to case statements on ADT/GADTs since we rarely have ADTs with more than 64 cases.