GHC should inline divInt#
Motivation
Currently divInt# is marked as {-# NOINLINE [0] modInt# #-}
This can lead to horrible situations where we get (in STG) a division by a constant:
case GHC.Classes.divInt# sat_shZP 3# of ww7_shZQ
Since it's a call we need to do some spilling first (in Cmm):
cmCK: // global
I64[Sp - 48] = cmCI;
_sm3W::I64 = R3;
R3 = 3;
_sm3V::I64 = R2;
R2 = I64[Sp + 8] + 2;
I64[Sp - 40] = _sm3V::I64;
I64[Sp - 32] = _sm3W::I64;
P64[Sp - 24] = R4;
I64[Sp - 16] = R5;
P64[Sp - 8] = R6;
Sp = Sp - 48;
call GHC.Classes.divInt#_info(R3,
R2) returns to cmCI, args: 8, res: 8, upd: 8;
Which adds a fair bit over overhead at the ASM level.
movq $block_ciGt_info,-48(%rbp)
movq %rsi,%rax
movl $3,%esi
movq 8(%rbp),%rbx
movq %r14,%rcx
leaq 2(%rbx),%r14
movq %rcx,-40(%rbp)
movq %rax,-32(%rbp)
movq %rdi,-24(%rbp)
movq %r8,-16(%rbp)
movq %r9,-8(%rbp)
addq $-48,%rbp
jmp GHC.Classes.divInt#_info
When then execute divInt#
x# `divInt#` y#
-- Be careful NOT to overflow if we do any additional arithmetic
-- on the arguments... the following previous version of this
-- code has problems with overflow:
-- | (x# ># 0#) && (y# <# 0#) = ((x# -# y#) -# 1#) `quotInt#` y#
-- | (x# <# 0#) && (y# ># 0#) = ((x# -# y#) +# 1#) `quotInt#` y#
= if isTrue# (x# ># 0#) && isTrue# (y# <# 0#) then ((x# -# 1#) `quotInt#` y#) -# 1#
else if isTrue# (x# <# 0#) && isTrue# (y# ># 0#) then ((x# +# 1#) `quotInt#` y#) -# 1#
else x# `quotInt#` y#
Proposal
divInt
should probably be marked as INLINE [0] instead.
This would allow all of the overhead above to be constant folded away just leaving quotInt#, which I think compiles to a single instruction.