Skip to content
Snippets Groups Projects

Remove target dependent CPP for Word64/Int64 (#11470)

Closed Sylvain Henry requested to merge hsyl20/ghc:hsyl20/word64-primops into master

Primops types were dependent on the target word-size at compiler compilation time. It's an issue for multi-target as GHC may not have the correct primops types for the target.

Edited by Sylvain Henry

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Developer

    @Ericson2314 This is a stripped down version of your !3658 (closed). I don't change the type of Word64/Int64 to avoid having to add new primops (e.g. quotRemWord64# which I had trouble adding in the past iirc).

  • Sylvain Henry added 21 commits

    added 21 commits

    Compare with previous version

  • Sylvain Henry added 1 commit

    added 1 commit

    Compare with previous version

  • Sylvain Henry added 1 commit

    added 1 commit

    Compare with previous version

  • Sylvain Henry added 1 commit

    added 1 commit

    Compare with previous version

  • John Ericson mentioned in merge request !5965 (merged)

    mentioned in merge request !5965 (merged)

  • Sylvain Henry added 13 commits

    added 13 commits

    Compare with previous version

  • Sylvain Henry added 2 commits

    added 2 commits

    • ad30dc1e - Remove target dependent CPP for Word64/Int64 (#11470)
    • 635629b9 - Add missing Int64/Word64 constant-folding rules

    Compare with previous version

  • Sylvain Henry added 1 commit

    added 1 commit

    • e1d3eb50 - Add missing Int64/Word64 constant-folding rules

    Compare with previous version

  • Ben Gamari approved this merge request

    approved this merge request

  • assigned to @hsyl20

  • John Ericson mentioned in commit 8d43878e

    mentioned in commit 8d43878e

  • John Ericson mentioned in merge request !6819 (closed)

    mentioned in merge request !6819 (closed)

  • John Ericson mentioned in commit 312c2177

    mentioned in commit 312c2177

  • John Ericson mentioned in merge request !6833 (closed)

    mentioned in merge request !6833 (closed)

    • Author Developer

      Small update: I have been trying to debug this on 32-bit arch. The following simple test fails with the stage1 compiler:

      main = do
        print (succ (minBound :: Int64))     -- prints 0!
        print (succ (I64# (intToInt64# 13#)) -- prints minBound for Int64

      (compiled without -O, otherwise constant folding kicks in and it returns the right result).

      Interestingly, hs_add64 gets called with completely bogus arguments. I'll try to find what is causing this.

    • Author Developer

      The following STG:

      GHC.Int.$fEnumInt64_$csucc :: GHC.Int.Int64 -> GHC.Int.Int64
      [GblId, Arity=1, Str=<1P(1L)>, Cpr=1, Unf=OtherCon []] =
          {} \r [x_s4IZ]
              case x_s4IZ of {
              GHC.Int.I64# x1_s4J1 [Occ=Once1!] ->
              case x1_s4J1 of wild1_s4J2 [Occ=Once1] {
                __DEFAULT ->
                    case plusInt64# [wild1_s4J2 1#64] of sat_s4J3 [Occ=Once1] {
                    __DEFAULT -> GHC.Int.I64# [sat_s4J3];
                    };
                9223372036854775807#64 -> GHC.Int.$fEnumInt9;
              };
              };

      turns into the following assembly:

      GHC.Int.$fEnumInt64_$csucc_info:
      _blk_c8Rz:
              movl (%ebp),%eax
              movl $block_c8Rw_info,(%ebp)
              movl %eax,%esi
              testw $3,%si
              jne _blk_c8Rw
      _blk_c8Rx:
              jmp *(%esi)
      .align 4,0x90
              .long   GHC.Int.$fEnumInt9_closure-(block_c8Rw_info)+0
              .long   0
              .word   30
              .word   1
      block_c8Rw_info:
      _blk_c8Rw:
              addl $12,%edi
              cmpl 804(%ebx),%edi
              ja _blk_c8RG
      _blk_c8RF:
              movl 3(%esi),%eax             ; EAX = 13
              movl 7(%esi),%ecx             ; ECX = 0
              movl $4294967295,%edx         ; EDX = 0xffffffff
              movl %ecx,64(%esp)            ; stack40 = 0
              movl $2147483647,%ecx         ; ECX = 0x7fffffff
              movl %edx,76(%esp)            ; stack4C = 0xffffffff
              movl 64(%esp),%edx            ; EDX = stack40 = 0
              xorl %ecx,%edx                ; EDX = 0 xor 0x7fffffff = 0x7fffffff
              movl 76(%esp),%ecx            ; ECX = stack4C = 0xffffffff
              xorl %ecx,%eax                ; EAX = 13 xor 0xffffffff = 0xfffffff2
              orl %edx,%eax                 ; EAX = 0xfffffff2 or 0x7fffffff = 0xffffffff
              jne _blk_c8RQ                 ; not zero, jump
      _blk_c8RR:
              addl $-12,%edi
              movl $GHC.Int.$fEnumInt9_closure,%esi
              addl $4,%ebp
              jmp *(%esi)
      _blk_c8RG:
              movl $12,828(%ebx)
              jmp stg_gc_unpt_r1
      _blk_c8RQ:
              subl $12,%esp                ; sub 0xc to stack offset
              movl $1,%ecx                 ; ECX = 1
              movl %edx,76(%esp)           ; stack40 = 0x7fffffff
              movl $0,%edx                 ; EDX = 0
              pushl %edx                   ; push (1 :: Int64) arg
              pushl %ecx
              movl 84(%esp),%edx           ; EDX = stack40 = 0x7fffffff
              pushl %edx                   ; EAX = 0xffffffff, push garbage 0x7fffffffffffffff!
              pushl %eax
              call hs_add64
              addl $28,%esp
              movl $GHC.Int.I64#_con_info,-8(%edi)
              movl %eax,-4(%edi)
              movl %edx,(%edi)
              leal -7(%edi),%esi
              addl $4,%ebp
              jmp *(%ebp)
              .size GHC.Int.$fEnumInt64_$csucc_info, .-GHC.Int.$fEnumInt64_$csucc_info

      Comments describe the execution of print (succ (I64# (intToInt64# 13#)). As we can see, registers containing 13 :: Int64# are clobbered before the call to hs_add64.

    • Author Developer

      What happens:

      1. an added rule transforms: eqInt64# x constant into case x of { DEFAULT -> False# ; constant -> True# }
      2. switch implementation in GHC.Cmm.Switch.Implement is unaware that on 32-bit architectures Int64# eq/ne primops are lowered into MO_x64_Eq/MO_x64_Ne instead of MO_Eq/MO_Ne (cf GHC.StgToCmm.Prim.emitPrimOp). It directly generates the following code:
      let scrut = CmmMachOp (MO_Ne width) [expr, CmmLit $ CmmInt i width]
             c8RF: // global
                 _s4J2::I64 = I64[R1 + 3];
                 if (_s4J2::I64 != 9223372036854775807 :: W64) goto c8RQ; else goto c8RR;
             c8RQ: // global
                 (_c8RJ::I64) = call MO_x64_Add(_s4J2::I64, 1 :: W64);
      1. it turns out that the NCG knows how to generate comparisons for 64-bit integers on 32-bit architectures (see Note [64-bit integer comparisons on 32-bit] in GHC.CmmToAsm.X86.CodeGen (why aren't we always using them?)). But it seems to compute intermediate results in the original registers instead of in fresh ones, clobbering them for future use:
              cmpExact =
                toOL
                  [ XOR II32 (OpReg r2_hi) (OpReg r1_hi)
                  , XOR II32 (OpReg r2_lo) (OpReg r1_lo)
                  , OR  II32 (OpReg r1_hi)  (OpReg r1_lo)
                  , JXX cond true
                  , JXX ALWAYS false
                  ]

      Register liveness confirms this:

             c8RF:
                      movl 3(%esi),%vI_n8RT
                          # born:    %vI_n8RT
      
                      movl 7(%esi),%vHi_H8RT
                          # born:    %vHi_H8RT
      
                      movl %vI_n8RT,%vI_s4J2
                          # born:    %vI_s4J2
                          # r_dying: %vI_n8RT
      
                      movl %vHi_H8RT,%vHi_H4J2
                          # born:    %vHi_H4J2
                          # r_dying: %vHi_H8RT
      
                      movl $4294967295,%vI_n8RU
                          # born:    %vI_n8RU
      
                      movl $2147483647,%vHi_H8RU
                          # born:    %vHi_H8RU
      
                      xorl %vHi_H8RU,%vHi_H4J2
                          # r_dying: %vHi_H8RU
      
                      xorl %vI_n8RU,%vI_s4J2
                          # r_dying: %vI_n8RU
      
                      orl %vHi_H4J2,%vI_s4J2
      
                      jne _blk_c8RQ
                          # r_dying: %vHi_H4J2 %vI_s4J2
      
                      jmp _blk_c8RR
      
           NONREC
              c8RG:
                      movl $12,828(%ebx)
      
                      jmp stg_gc_unpt_r1
                           ,
           NONREC
              c8RQ:
                      subl $12,%esp
      
                      movl $1,%vI_n8RV
                          # born:    %vI_n8RV
      
                      movl $0,%vHi_H8RV
                          # born:    %vHi_H8RV
      
                      pushl %vHi_H8RV
                          # r_dying: %vHi_H8RV
      
                      pushl %vI_n8RV
                          # r_dying: %vI_n8RV
      
                      pushl %vHi_H4J2
                          # r_dying: %vHi_H4J2
      
                      pushl %vI_s4J2
                          # r_dying: %vI_s4J2
      
                      call hs_add64
                          # born:    %r0 %r2 %r3 %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
                          # w_dying: %r2 %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23

      I'll push a fix.

      Edited by Sylvain Henry
    • Please register or sign in to reply
  • Sylvain Henry added 19 commits

    added 19 commits

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading