GHC should inline divInt#

Motivation

Currently divInt# is marked as {-# NOINLINE [0] modInt# #-}

This can lead to horrible situations where we get (in STG) a division by a constant:

case GHC.Classes.divInt# sat_shZP 3# of ww7_shZQ

Since it's a call we need to do some spilling first (in Cmm):

       cmCK: // global
           I64[Sp - 48] = cmCI;
           _sm3W::I64 = R3;
           R3 = 3;
           _sm3V::I64 = R2;
           R2 = I64[Sp + 8] + 2;
           I64[Sp - 40] = _sm3V::I64;
           I64[Sp - 32] = _sm3W::I64;
           P64[Sp - 24] = R4;
           I64[Sp - 16] = R5;
           P64[Sp - 8] = R6;
           Sp = Sp - 48;
           call GHC.Classes.divInt#_info(R3,
                                         R2) returns to cmCI, args: 8, res: 8, upd: 8;

Which adds a fair bit over overhead at the ASM level.

	movq $block_ciGt_info,-48(%rbp)
	movq %rsi,%rax
	movl $3,%esi
	movq 8(%rbp),%rbx
	movq %r14,%rcx
	leaq 2(%rbx),%r14
	movq %rcx,-40(%rbp)
	movq %rax,-32(%rbp)
	movq %rdi,-24(%rbp)
	movq %r8,-16(%rbp)
	movq %r9,-8(%rbp)
	addq $-48,%rbp
	jmp GHC.Classes.divInt#_info

When then execute divInt#

x# `divInt#` y#
        -- Be careful NOT to overflow if we do any additional arithmetic
        -- on the arguments...  the following  previous version of this
        -- code has problems with overflow:
--    | (x# ># 0#) && (y# <# 0#) = ((x# -# y#) -# 1#) `quotInt#` y#
--    | (x# <# 0#) && (y# ># 0#) = ((x# -# y#) +# 1#) `quotInt#` y#
    =      if isTrue# (x# ># 0#) && isTrue# (y# <# 0#) then ((x# -# 1#) `quotInt#` y#) -# 1#
      else if isTrue# (x# <# 0#) && isTrue# (y# ># 0#) then ((x# +# 1#) `quotInt#` y#) -# 1#
      else x# `quotInt#` y#

Proposal

divInt should probably be marked as INLINE [0] instead.

This would allow all of the overhead above to be constant folded away just leaving quotInt#, which I think compiles to a single instruction.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information