Remove target dependent CPP for Word64/Int64 (#11470)
Primops types were dependent on the target word-size at compiler compilation time. It's an issue for multi-target as GHC may not have the correct primops types for the target.
Merge request reports
Activity
@Ericson2314 This is a stripped down version of your !3658 (closed). I don't change the type of Word64/Int64 to avoid having to add new primops (e.g.
quotRemWord64#
which I had trouble adding in the past iirc).added 21 commits
-
22c7ce58...d73131b9 - 20 commits from branch
ghc:master
- fb2b6de2 - Remove target dependent CPP for Word64/Int64 (#11470)
-
22c7ce58...d73131b9 - 20 commits from branch
added 1 commit
added 1 commit
added 1 commit
@hsyl20 Wow! Thank you!! Really great to finally be done with this and, have two parallel cross and primop tracks henceforth.
mentioned in merge request !5965 (merged)
added 13 commits
-
3e0c6085...f6f24515 - 12 commits from branch
ghc:master
- a008dcc0 - Remove target dependent CPP for Word64/Int64 (#11470)
-
3e0c6085...f6f24515 - 12 commits from branch
added 1 commit
- e1d3eb50 - Add missing Int64/Word64 constant-folding rules
Sadly it seems there are some (spurious?) performance shifts in
T12545
. Otherwise looks great!Edited by Ben Gamari
assigned to @hsyl20
mentioned in commit 8d43878e
mentioned in merge request !6819 (closed)
mentioned in commit 312c2177
mentioned in merge request !6833 (closed)
Small update: I have been trying to debug this on 32-bit arch. The following simple test fails with the stage1 compiler:
main = do print (succ (minBound :: Int64)) -- prints 0! print (succ (I64# (intToInt64# 13#)) -- prints minBound for Int64
(compiled without
-O
, otherwise constant folding kicks in and it returns the right result).Interestingly,
hs_add64
gets called with completely bogus arguments. I'll try to find what is causing this.The following STG:
GHC.Int.$fEnumInt64_$csucc :: GHC.Int.Int64 -> GHC.Int.Int64 [GblId, Arity=1, Str=<1P(1L)>, Cpr=1, Unf=OtherCon []] = {} \r [x_s4IZ] case x_s4IZ of { GHC.Int.I64# x1_s4J1 [Occ=Once1!] -> case x1_s4J1 of wild1_s4J2 [Occ=Once1] { __DEFAULT -> case plusInt64# [wild1_s4J2 1#64] of sat_s4J3 [Occ=Once1] { __DEFAULT -> GHC.Int.I64# [sat_s4J3]; }; 9223372036854775807#64 -> GHC.Int.$fEnumInt9; }; };
turns into the following assembly:
GHC.Int.$fEnumInt64_$csucc_info: _blk_c8Rz: movl (%ebp),%eax movl $block_c8Rw_info,(%ebp) movl %eax,%esi testw $3,%si jne _blk_c8Rw _blk_c8Rx: jmp *(%esi) .align 4,0x90 .long GHC.Int.$fEnumInt9_closure-(block_c8Rw_info)+0 .long 0 .word 30 .word 1 block_c8Rw_info: _blk_c8Rw: addl $12,%edi cmpl 804(%ebx),%edi ja _blk_c8RG _blk_c8RF: movl 3(%esi),%eax ; EAX = 13 movl 7(%esi),%ecx ; ECX = 0 movl $4294967295,%edx ; EDX = 0xffffffff movl %ecx,64(%esp) ; stack40 = 0 movl $2147483647,%ecx ; ECX = 0x7fffffff movl %edx,76(%esp) ; stack4C = 0xffffffff movl 64(%esp),%edx ; EDX = stack40 = 0 xorl %ecx,%edx ; EDX = 0 xor 0x7fffffff = 0x7fffffff movl 76(%esp),%ecx ; ECX = stack4C = 0xffffffff xorl %ecx,%eax ; EAX = 13 xor 0xffffffff = 0xfffffff2 orl %edx,%eax ; EAX = 0xfffffff2 or 0x7fffffff = 0xffffffff jne _blk_c8RQ ; not zero, jump _blk_c8RR: addl $-12,%edi movl $GHC.Int.$fEnumInt9_closure,%esi addl $4,%ebp jmp *(%esi) _blk_c8RG: movl $12,828(%ebx) jmp stg_gc_unpt_r1 _blk_c8RQ: subl $12,%esp ; sub 0xc to stack offset movl $1,%ecx ; ECX = 1 movl %edx,76(%esp) ; stack40 = 0x7fffffff movl $0,%edx ; EDX = 0 pushl %edx ; push (1 :: Int64) arg pushl %ecx movl 84(%esp),%edx ; EDX = stack40 = 0x7fffffff pushl %edx ; EAX = 0xffffffff, push garbage 0x7fffffffffffffff! pushl %eax call hs_add64 addl $28,%esp movl $GHC.Int.I64#_con_info,-8(%edi) movl %eax,-4(%edi) movl %edx,(%edi) leal -7(%edi),%esi addl $4,%ebp jmp *(%ebp) .size GHC.Int.$fEnumInt64_$csucc_info, .-GHC.Int.$fEnumInt64_$csucc_info
Comments describe the execution of
print (succ (I64# (intToInt64# 13#))
. As we can see, registers containing13 :: Int64#
are clobbered before the call tohs_add64
.What happens:
- an added rule transforms:
eqInt64# x constant
intocase x of { DEFAULT -> False# ; constant -> True# }
- switch implementation in GHC.Cmm.Switch.Implement is unaware that on 32-bit architectures Int64# eq/ne primops are lowered into
MO_x64_Eq
/MO_x64_Ne
instead ofMO_Eq
/MO_Ne
(cfGHC.StgToCmm.Prim.emitPrimOp
). It directly generates the following code:
let scrut = CmmMachOp (MO_Ne width) [expr, CmmLit $ CmmInt i width]
c8RF: // global _s4J2::I64 = I64[R1 + 3]; if (_s4J2::I64 != 9223372036854775807 :: W64) goto c8RQ; else goto c8RR; c8RQ: // global (_c8RJ::I64) = call MO_x64_Add(_s4J2::I64, 1 :: W64);
- it turns out that the NCG knows how to generate comparisons for 64-bit integers on 32-bit architectures (see Note [64-bit integer comparisons on 32-bit] in GHC.CmmToAsm.X86.CodeGen (why aren't we always using them?)). But it seems to compute intermediate results in the original registers instead of in fresh ones, clobbering them for future use:
cmpExact = toOL [ XOR II32 (OpReg r2_hi) (OpReg r1_hi) , XOR II32 (OpReg r2_lo) (OpReg r1_lo) , OR II32 (OpReg r1_hi) (OpReg r1_lo) , JXX cond true , JXX ALWAYS false ]
Register liveness confirms this:
c8RF: movl 3(%esi),%vI_n8RT # born: %vI_n8RT movl 7(%esi),%vHi_H8RT # born: %vHi_H8RT movl %vI_n8RT,%vI_s4J2 # born: %vI_s4J2 # r_dying: %vI_n8RT movl %vHi_H8RT,%vHi_H4J2 # born: %vHi_H4J2 # r_dying: %vHi_H8RT movl $4294967295,%vI_n8RU # born: %vI_n8RU movl $2147483647,%vHi_H8RU # born: %vHi_H8RU xorl %vHi_H8RU,%vHi_H4J2 # r_dying: %vHi_H8RU xorl %vI_n8RU,%vI_s4J2 # r_dying: %vI_n8RU orl %vHi_H4J2,%vI_s4J2 jne _blk_c8RQ # r_dying: %vHi_H4J2 %vI_s4J2 jmp _blk_c8RR NONREC c8RG: movl $12,828(%ebx) jmp stg_gc_unpt_r1 , NONREC c8RQ: subl $12,%esp movl $1,%vI_n8RV # born: %vI_n8RV movl $0,%vHi_H8RV # born: %vHi_H8RV pushl %vHi_H8RV # r_dying: %vHi_H8RV pushl %vI_n8RV # r_dying: %vI_n8RV pushl %vHi_H4J2 # r_dying: %vHi_H4J2 pushl %vI_s4J2 # r_dying: %vI_s4J2 call hs_add64 # born: %r0 %r2 %r3 %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23 # w_dying: %r2 %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
I'll push a fix.
Edited by Sylvain Henry- an added rule transforms:
added 19 commits
-
e1d3eb50...0f7541dc - 16 commits from branch
ghc:master
- 20726dce - Remove target dependent CPP for Word64/Int64 (#11470)
- eb1224c6 - Add missing Int64/Word64 constant-folding rules
- 14ea1d5a - i386: fix codegen of 64-bit comparisons
Toggle commit list-
e1d3eb50...0f7541dc - 16 commits from branch