GHC 9.10.1 i386: incorrect arithmetic optimization
Summary
Consider the snippet
import Data.Word
main = print $ c == c + f 0
where
c = 0x8000000000000000
f :: Word64 -> Word64
f x = if x > 0 then f x else x
As not (0 > 0)
, f 0
should evaluate to 0
, so c == c + f 0
, so the program should print True
.
However, with GHC 9.10.1 on i386 and -O1
, it prints False
.
Steps to reproduce
$ ghc --info | rg 'Project version|target platform string'
,("target platform string","i386-unknown-linux")
,("Project version","9.10.1")
$ ghc -O1 bug.hs
$ ./bug
False
Expected behavior
It should print True
, just as on all previous GHC versions, as well as on other platforms like x86_64-linux, and also with -O0
.
Analysis
This looks like a code generation bug; the final STG looks fine, and at least the WASM backend (also 32bit) does not exhibit this bug.
Bisecting (needed to apply 843f95b1 to compile most commits between 9.8 and 9.10) yields that 6755d833 (!10771 (merged)) introduced this bug, cc @jaro @AndreasK.
Environment
- GHC version used: 9.10.1 (installed via ghcup)
- Operating System/System Architecture:
i386/debian:buster
docker container on x86_64-linux
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- ghc-triage-bot added needs triage label
added needs triage label
- Developer
Thanks for the report. Could you share Cmm and Asm dumps with
-dno-typeable-binds
? Collapse replies -ddump-cmm
==================== Output Cmm ==================== 2024-05-28 08:33:11.645146398 UTC [Main.$wf_entry() { // [] { info_tbls: [(c1ST, label: Main.$wf_info rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 8} } srt: Nothing)] stack_info: arg_space: 12 } {offset c1ST: // global _s1SF::I64 = I64[Sp]; goto c1SL; c1SL: // global if (_s1SF::I64 <= 0 :: W64) goto c1SR; else goto c1SS; c1SR: // global L1 = _s1SF::I64; Sp = Sp + 8; call (P32[Sp])(L1) args: 4, res: 0, upd: 4; c1SS: // global I64[Sp] = _s1SF::I64; goto c1SL; } }, section ""data" . Main.$wf_closure" { Main.$wf_closure: const Main.$wf_info; }] ==================== Output Cmm ==================== 2024-05-28 08:33:11.647060295 UTC [section ""data" . _u1Ty_srt" { _u1Ty_srt: const stg_SRT_2_info; const GHC.Internal.Show.$fShowBool5_closure; const GHC.Internal.Show.$fShowBool4_closure; const 0; }, Main.main2_entry() { // [R1] { info_tbls: [(c1Tb, label: block_c1Tb_info rep: StackRep [] srt: Just _u1Ty_srt), (c1Td, label: Main.main2_info rep: HeapRep static { Thunk } srt: Just _u1Ty_srt)] stack_info: arg_space: 4 } {offset c1Td: // global if ((Sp + -20) < SpLim) (likely: False) goto c1Te; else goto c1Tf; c1Te: // global call (stg_gc_enter_1)(R1) args: 4, res: 0, upd: 4; c1Tf: // global (_c1T8::I32) = call "ccall" arg hints: [PtrHint, PtrHint] result hints: [PtrHint] newCAF(BaseReg, R1); if (_c1T8::I32 == 0) goto c1Ta; else goto c1T9; c1Ta: // global call (I32![R1])() args: 4, res: 0, upd: 4; c1T9: // global I32[Sp - 8] = stg_bh_upd_frame_info; I32[Sp - 4] = _c1T8::I32; I32[Sp - 12] = c1Tb; I64[Sp - 20] = 0 :: W64; Sp = Sp - 20; call Main.$wf_info() returns to c1Tb, args: 12, res: 4, upd: 12; c1Tb: // global if (L1 != 9223372036854775808 :: W64) goto c1To; else goto c1Tu; c1To: // global R1 = GHC.Internal.Show.$fShowBool5_closure; Sp = Sp + 4; call (I32![R1])(R1) args: 12, res: 0, upd: 12; c1Tu: // global R1 = GHC.Internal.Show.$fShowBool4_closure; Sp = Sp + 4; call (I32![R1])(R1) args: 12, res: 0, upd: 12; } }, section ""data" . Main.main2_closure" { Main.main2_closure: const Main.main2_info; const 0; const 0; const 0; }] ==================== Output Cmm ==================== 2024-05-28 08:33:11.649148958 UTC [Main.main1_entry() { // [] { info_tbls: [(c1TQ, label: Main.main1_info rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 3} } srt: Nothing)] stack_info: arg_space: 4 } {offset c1TQ: // global if ((Sp + -12) < SpLim) (likely: False) goto c1TR; else goto c1TS; c1TR: // global R1 = Main.main1_closure; call (stg_gc_fun)(R1) args: 4, res: 0, upd: 4; c1TS: // global P32[Sp - 12] = GHC.Internal.IO.Handle.FD.stdout_closure; P32[Sp - 8] = Main.main2_closure; P32[Sp - 4] = GHC.Types.True_closure+2; Sp = Sp - 12; call GHC.Internal.IO.Handle.Text.hPutStr2_info() args: 16, res: 0, upd: 4; } }, section ""data" . Main.main1_closure" { Main.main1_closure: const Main.main1_info; const GHC.Internal.IO.Handle.Text.hPutStr2_closure; const GHC.Internal.IO.Handle.FD.stdout_closure; const Main.main2_closure; const 0; }] ==================== Output Cmm ==================== 2024-05-28 08:33:11.68996955 UTC [Main.main_entry() { // [] { info_tbls: [(c1U1, label: Main.main_info rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 3} } srt: Just Main.main1_closure)] stack_info: arg_space: 4 } {offset c1U1: // global call Main.main1_info() args: 4, res: 0, upd: 4; } }, section ""data" . Main.main_closure" { Main.main_closure: const Main.main_info; const 0; }] ==================== Output Cmm ==================== 2024-05-28 08:33:11.690944588 UTC [Main.main3_entry() { // [] { info_tbls: [(c1Ub, label: Main.main3_info rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 3} } srt: Nothing)] stack_info: arg_space: 4 } {offset c1Ub: // global if ((Sp + -4) < SpLim) (likely: False) goto c1Uc; else goto c1Ud; c1Uc: // global R1 = Main.main3_closure; call (stg_gc_fun)(R1) args: 4, res: 0, upd: 4; c1Ud: // global P32[Sp - 4] = Main.main1_closure+1; Sp = Sp - 4; call GHC.Internal.TopHandler.runMainIO1_info() args: 8, res: 0, upd: 4; } }, section ""data" . Main.main3_closure" { Main.main3_closure: const Main.main3_info; const Main.main1_closure; const GHC.Internal.TopHandler.runMainIO1_closure; const 0; }] ==================== Output Cmm ==================== 2024-05-28 08:33:11.692195708 UTC [:Main.main_entry() { // [] { info_tbls: [(c1Um, label: :Main.main_info rep: HeapRep static { Fun {arity: 1 fun_type: ArgSpec 3} } srt: Just Main.main3_closure)] stack_info: arg_space: 4 } {offset c1Um: // global call Main.main3_info() args: 4, res: 0, upd: 4; } }, section ""data" . :Main.main_closure" { :Main.main_closure: const :Main.main_info; const 0; }]
-ddump-asm
==================== Asm code ==================== 2024-05-28 08:33:11.646251717 UTC .section .text .align 4,0x90 .align 4,0x90 .long 65544 .long 0 .word 14 .word 0 .globl Main.$wf_info .type Main.$wf_info, @function Main.$wf_info: .Lc1ST: movl (%ebp),%eax movl 4(%ebp),%ecx .Lc1SL: movl $0,%edx movl %ecx,64(%esp) movl $0,%ecx cmpl %eax,%edx movl 64(%esp),%edx sbbl %edx,%ecx jae .Lc1SR .Lc1SS: movl %eax,(%ebp) movl %edx,4(%ebp) .Ln1T3: movl %edx,%ecx jmp .Lc1SL .Lc1SR: movl %eax,784(%ebx) movl %edx,788(%ebx) addl $8,%ebp jmp *(%ebp) .size Main.$wf_info, .-Main.$wf_info ==================== Asm code ==================== 2024-05-28 08:33:11.646533611 UTC .section .data .align 4 .align 1 .globl Main.$wf_closure .type Main.$wf_closure, @object Main.$wf_closure: .long Main.$wf_info ==================== Asm code ==================== 2024-05-28 08:33:11.647842331 UTC .section .data .align 4 .align 1 .Lu1Ty_srt: .long stg_SRT_2_info .long GHC.Internal.Show.$fShowBool5_closure .long GHC.Internal.Show.$fShowBool4_closure .long 0 ==================== Asm code ==================== 2024-05-28 08:33:11.648430544 UTC .section .text .align 4,0x90 .align 4,0x90 .long .Lu1Ty_srt-(Main.main2_info)+0 .long 0 .word 21 .word 1 .globl Main.main2_info .type Main.main2_info, @function Main.main2_info: .Lc1Td: leal -20(%ebp),%eax cmpl 796(%ebx),%eax jb .Lc1Te .Lc1Tf: subl $4,%esp pushl %esi pushl %ebx call newCAF addl $12,%esp testl %eax,%eax je .Lc1Ta .Lc1T9: movl $stg_bh_upd_frame_info,-8(%ebp) movl %eax,-4(%ebp) movl $.Lblock_c1Tb_info,-12(%ebp) movl $0,%eax movl $0,%ecx movl %eax,-20(%ebp) movl %ecx,-16(%ebp) addl $-20,%ebp jmp Main.$wf_info .Lc1Ta: movl (%esi),%eax jmp *%eax .align 4,0x90 .long .Lu1Ty_srt-(.Lblock_c1Tb_info)+0 .long 0 .word 30 .word 1 .Lblock_c1Tb_info: .Lc1Tb: movl 784(%ebx),%eax movl 788(%ebx),%ecx movl $0,%edx movl %ecx,64(%esp) movl $2147483648,%ecx movl %ecx,76(%esp) movl 64(%esp),%ecx movl %edx,88(%esp) movl 76(%esp),%edx xorl %edx,%ecx movl 88(%esp),%edx xorl %edx,%eax orl %ecx,%eax jne .Lc1To .Lc1Tu: movl $GHC.Internal.Show.$fShowBool4_closure,%esi addl $4,%ebp movl (%esi),%eax jmp *%eax .Lc1Te: jmp *-8(%ebx) .Lc1To: movl $GHC.Internal.Show.$fShowBool5_closure,%esi addl $4,%ebp movl (%esi),%eax jmp *%eax .size Main.main2_info, .-Main.main2_info ==================== Asm code ==================== 2024-05-28 08:33:11.64885415 UTC .section .data .align 4 .align 1 .globl Main.main2_closure .type Main.main2_closure, @object Main.main2_closure: .long Main.main2_info .long 0 .long 0 .long 0 ==================== Asm code ==================== 2024-05-28 08:33:11.689143508 UTC .section .text .align 4,0x90 .align 4,0x90 .long 65539 .long 3 .word 14 .word 0 .globl Main.main1_info .type Main.main1_info, @function Main.main1_info: .Lc1TQ: leal -12(%ebp),%eax cmpl 796(%ebx),%eax jb .Lc1TR .Lc1TS: movl $GHC.Internal.IO.Handle.FD.stdout_closure,-12(%ebp) movl $Main.main2_closure,-8(%ebp) movl $GHC.Types.True_closure+2,-4(%ebp) addl $-12,%ebp jmp GHC.Internal.IO.Handle.Text.hPutStr2_info .Lc1TR: movl $Main.main1_closure,%esi jmp *-4(%ebx) .size Main.main1_info, .-Main.main1_info ==================== Asm code ==================== 2024-05-28 08:33:11.689557976 UTC .section .data .align 4 .align 1 .globl Main.main1_closure .type Main.main1_closure, @object Main.main1_closure: .long Main.main1_info .long GHC.Internal.IO.Handle.Text.hPutStr2_closure .long GHC.Internal.IO.Handle.FD.stdout_closure .long Main.main2_closure .long 0 ==================== Asm code ==================== 2024-05-28 08:33:11.690368578 UTC .section .text .align 4,0x90 .align 4,0x90 .long Main.main1_closure-(Main.main_info)+0 .long 65539 .long 0 .word 14 .word 1 .globl Main.main_info .type Main.main_info, @function Main.main_info: .Lc1U1: jmp Main.main1_info .size Main.main_info, .-Main.main_info ==================== Asm code ==================== 2024-05-28 08:33:11.690577441 UTC .section .data .align 4 .align 1 .globl Main.main_closure .type Main.main_closure, @object Main.main_closure: .long Main.main_info .long 0 ==================== Asm code ==================== 2024-05-28 08:33:11.691574591 UTC .section .text .align 4,0x90 .align 4,0x90 .long 65539 .long 2 .word 14 .word 0 .globl Main.main3_info .type Main.main3_info, @function Main.main3_info: .Lc1Ub: leal -4(%ebp),%eax cmpl 796(%ebx),%eax jb .Lc1Uc .Lc1Ud: movl $Main.main1_closure+1,-4(%ebp) addl $-4,%ebp jmp GHC.Internal.TopHandler.runMainIO1_info .Lc1Uc: movl $Main.main3_closure,%esi jmp *-4(%ebx) .size Main.main3_info, .-Main.main3_info ==================== Asm code ==================== 2024-05-28 08:33:11.691877325 UTC .section .data .align 4 .align 1 .globl Main.main3_closure .type Main.main3_closure, @object Main.main3_closure: .long Main.main3_info .long Main.main1_closure .long GHC.Internal.TopHandler.runMainIO1_closure .long 0 ==================== Asm code ==================== 2024-05-28 08:33:11.692575719 UTC .section .text .align 4,0x90 .align 4,0x90 .long Main.main3_closure-(:Main.main_info)+0 .long 65539 .long 0 .word 14 .word 1 .globl :Main.main_info .type :Main.main_info, @function :Main.main_info: .Lc1Um: jmp Main.main3_info .size :Main.main_info, .-:Main.main_info ==================== Asm code ==================== 2024-05-28 08:33:11.692775685 UTC .section .data .align 4 .align 1 .globl :Main.main_closure .type :Main.main_closure, @object :Main.main_closure: .long :Main.main_info .long 0
- Developer
Could you share Core and STG too? The addition of
c
to the result off 0
is gone in Cmm:call Main.$wf_info() returns to c1Tb, args: 12, res: 4, upd: 12; // call `f` passing it 0 c1Tb: // global if (L1 != 9223372036854775808 :: W64) goto c1To; else goto c1Tu; // on return, check the result directly without adding anything
It is still present in the final STG (
plusWord64# [9223372036854775808#Word64 ww_s1SC]
):-ddump-stg-final
==================== Final STG: ==================== 2024-05-28 09:41:30.878737943 UTC Rec { Main.$wf [InlPrag=[2], Occ=LoopBreaker] :: GHC.Prim.Word64# -> GHC.Prim.Word64# [GblId, Arity=1, Str=<SL>, Unf=OtherCon []] = {} \r [ww_s1SA] case gtWord64# [ww_s1SA 0#Word64] of lwild_s1SB { __DEFAULT -> ww_s1SA<TagProper>; 1# -> Main.$wf ww_s1SA; }; end Rec } Main.main2 :: GHC.Internal.Base.String [GblId] = {} \u [] case Main.$wf 0#Word64 of ww_s1SC { __DEFAULT -> case plusWord64# [9223372036854775808#Word64 ww_s1SC] of wild_s1SD { __DEFAULT -> GHC.Internal.Show.$fShowBool5; 9223372036854775808#Word64 -> GHC.Internal.Show.$fShowBool4; }; }; Main.main1 :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) [GblId, Arity=1, Str=<L>, Unf=OtherCon []] = {} \r [void_0E] GHC.Internal.IO.Handle.Text.hPutStr2 GHC.Internal.IO.Handle.FD.stdout Main.main2 GHC.Types.True GHC.Prim.void#; Main.main :: GHC.Types.IO () [GblId, Arity=1, Str=<L>, Unf=OtherCon []] = {} \r [void_0E] Main.main1 GHC.Prim.void#; Main.main3 :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #) [GblId, Arity=1, Str=<L>, Unf=OtherCon []] = {} \r [void_0E] GHC.Internal.TopHandler.runMainIO1 Main.main1 GHC.Prim.void#; :Main.main :: GHC.Types.IO () [GblId, Arity=1, Str=<L>, Unf=OtherCon []] = {} \r [void_0E] Main.main3 GHC.Prim.void#;
- Developer
Ok. So it's probably in Cmm opt.
This looks dubious:
-- in GHC.Cmm.Opt cmmMachOpFoldM _ (MO_Add _) [CmmReg reg, CmmLit (CmmInt n rep)] = Just $! cmmRegOff reg (fromIntegral (narrowS rep n)) ...
We use
fromIntegral
which casts the 64-bit0x8000000000000000
into a compiler'sInt
(32-bit).We shouldn't do it for
MO_Add W64
on 32 bit arch. Same thing for similar optimizations below this one. They didn't kick-in before the patch you mentioned becauseMO_x64_Add
was used instead ofMO_Add
for 32-bit targets. - Developer
As the Wasm compiler is a cross-compiler and you're probably trying on a 64-bit arch, the compiler
Int
is 64-bit and then the Wasm codegen converts theCmmRegOff
back into a 64-bit addition:lower_CmmRegOff :: CLabel -> CmmReg -> Int -> WasmCodeGenM w (SomeWasmExpr w) lower_CmmRegOff lbl reg 0 = lower_CmmReg lbl reg lower_CmmRegOff lbl reg o = do SomeWasmExpr ty (WasmExpr reg_instr) <- lower_CmmReg lbl reg pure $ SomeWasmExpr ty $ WasmExpr $ reg_instr `WasmConcat` WasmConst ty (toInteger o) `WasmConcat` WasmAdd ty
- Developer
@amesgen Would you like to try the proposed fix in #24893 (comment 566996) and open a MR for it?
- Developer
I think this bug will also trigger if we cross-compile from a 32-bit arch to a 64-bit target. Perhaps the best way to fix this would be to:
--- compiler/GHC/Cmm/Expr.hs +++ compiler/GHC/Cmm/Expr.hs @@ -66,7 +66,7 @@ data CmmExpr | CmmStackSlot Area {-# UNPACK #-} !Int -- Addressing expression of a stack slot -- See Note [CmmStackSlot aliasing] - | CmmRegOff !CmmReg !Int + | CmmRegOff !CmmReg !Int64 -- CmmRegOff reg i -- ** is shorthand only, meaning ** -- CmmMachOp (MO_Add rep) [x, CmmLit (CmmInt (fromIntegral i) rep)]
It will make the comment true
Edited by Sylvain Henry 1 - Developer
@AndreasK is ready to help/implement if needed.
1 I confirmed that these optimizations are indeed the culprit (removing them fixes the bug).
I am currently considering this approach (just for the first case, the other cases are very similar):
cmmMachOpFoldM _ (MO_Add _) [CmmReg reg, CmmLit (CmmInt n rep)] | let n' = fromIntegral (narrowS rep n) , widthInBits rep <= finiteBitSize n' = Just $! cmmRegOff reg n'
Technically, I think this would be enough to fix the bug. However, whether this optimization fires now depends on the host system of the compiler (ie a native 64bit GHC will perform the optimization, but a 32bit -> 64bit cross GHC won't, as mentioned above).
If this is undesirable, I see at least two options:
- Use
Int64
inCmmRegOff
(and probably also inCmmStackSlot
for consistency, see e.g. functions likecmmOffset
). Just looking at the types, aCmmInt
can have aWidth
of up toW512
, soInt64
still might not be enough in all cases, but probably the other cases are so rare that it is fine if the optimization does not apply for them? - As before, but use
Integer
. Would also mean that the width check is unnecessary.
WDYT?
Edited by amesgen- Use
- Developer
I vote for using
Integer
and getting rid of the dubious narrowing logic here. 1 - Developer
This (Integer) or, if performance suffers too much, use Int64 as it covers every MO_Add width we support and replace unsafe
fromIntegral :: Integer -> Int64
with a checked cast that panics if the input is out of range.Thanks for confirming the root cause!
1 Opened !12756 (closed). Using
Integer
required propagating that change to a few places, and there are some new usages offromInteger
(but some are also removed) as many other things still useInt
. I kept it relatively minimal for now, even though also switching fromInt
toInteger
forCmmStackSlot
/CmmLabelOff
/CmmLabelDiffOff
seems related.I also tried adding the full-ci label, but it seems I don't have the permissions to do so.
- Developer
Thanks! I think we should fix the other uses of Int you mention. Cross-compiling from a 32-bit arch to a 64-bit target must be quite broken currently.
Maybe we should open a ticket about banning the use of
fromInteger
/fromIntegral
in the compiler to only use safe casts (widening...), checked narrowing, and unsafe casts only checked in debug compilers for narrowing cases that we assert to be safe.
- Sylvain Henry changed milestone to %9.10.2
changed milestone to %9.10.2
- Sylvain Henry added NCG backend Phigh Tbug code generation i386 labels and removed needs triage label
added NCG backend Phigh Tbug code generation i386 labels and removed needs triage label
- Matthew Pickering assigned to @amesgen and @AndreasK
- Ilias Tsitsimpis mentioned in issue #24700 (closed)
mentioned in issue #24700 (closed)
- Reporter
This is a duplicate of #24700 (closed). Thanks for working on this. It would also be great if we can backport this fix to older versions of GHC.
Collapse replies - Developer
- beef6135 (cc @Ericson2314) enabled the bug for LLVM/C in the master branch (9.4+)
- it was backported into 9.2 (8105bbb0)
- 6755d833 enabled the bug for the x86 NCG in the master branch (9.10+) and wasn't backported
So yes backporting the fix is important for other backends
- amesgen mentioned in merge request !12756 (closed)
mentioned in merge request !12756 (closed)
- Cheng Shao mentioned in issue #24960
mentioned in issue #24960
- Sylvain Henry mentioned in merge request !13096 (closed)
mentioned in merge request !13096 (closed)
- Sylvain Henry mentioned in issue #25151
mentioned in issue #25151
- Developer
Fixed in !13096 (closed). Follow-up tracked in #25151
- Sylvain Henry closed
closed