I found this test failure while working on nixpkgs on aarch64. This is extremely alarming, before I reduced it to a simpler program using only base, at which point I am out-of-this-world alarmed.
{-# LANGUAGE NoImplicitPrelude, OverloadedStrings, RankNTypes, ScopedTypeVariables, NoRebindableSyntax #-}moduleMainwhereimportPreludeimportSystem.EnvironmentimportData.WordtestF::Word8->Word8->BooltestFab=ifb==0thenTrueelsea==(a`div`b)*b+(a`mod`b)main::IO()main=doargs<-getArgslet[a::Word8,b]=read<$>argsputStrLn.show$testFab
Then:
co/ghc942bug » ghc -O1 app/Main.hs[1 of 2] Compiling Main ( app/Main.hs, app/Main.o )[2 of 2] Linking app/Mainco/ghc942bug » app/Main 217 161Falseco/ghc942bug » ghc -O0 app/Main.hs[1 of 2] Compiling Main ( app/Main.hs, app/Main.o ) [Optimisation flags changed][2 of 2] Linking app/Main [Objects changed]co/ghc942bug » app/Main 217 161True
Note that it's fine on 9.2.4:
co/ghc942bug » ghcup set 9.2.4[ Warn ] This is an old-style command for setting GHC. Use 'ghcup set ghc' instead.[ Info ] GHC 9.2.4 successfully set as default versionco/ghc942bug » ghc -O0 app/Main.hs[1 of 1] Compiling Main ( app/Main.hs, app/Main.o )Linking app/Main ...co/ghc942bug » app/Main 217 161Trueco/ghc942bug » ghc -O1 app/Main.hs[1 of 1] Compiling Main ( app/Main.hs, app/Main.o ) [Optimisation flags changed]Linking app/Main ...co/ghc942bug » app/Main 217 161True
On Linux, GHC 9.4.2 from nixpkgs (qemu on Apple Silicon):
~ » ghc -O1 Repro.hs[1 of 2] Compiling Main ( Repro.hs, Repro.o )[2 of 2] Linking Repro~ » ./Repro 217 161False~ » ghc -O0 Repro.hs[1 of 2] Compiling Main ( Repro.hs, Repro.o ) [Optimisation flags changed][2 of 2] Linking Repro [Objects changed]~ » ./Repro 217 161True~ » uname -aLinux thinnix 5.15.47 #1-NixOS SMP Tue Jun 14 16:36:28 UTC 2022 aarch64 GNU/Linux
Expected behavior
It should return True in both optimization configurations.
Environment
GHC version used: 9.4.2 from ghcup although it also happens on ghc from nix. It's fine in ghc 9.2.4.
Optional:
Operating System: macOS, although it also happens on aarch64-linux
System Architecture: aarch64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
I can confirm that this is a backend issue, apparently quite late into the pipeline—the generated Cmm for the miscompiled block is identical on x86_64. Here’s a somewhat nicer test case:
Now the interesting part would be
(a) does this happen with -fllvm as well, what's the assembly the llvm codegen produces?
(b) more importantly though, as this is supposedly only in 9.4, what's the assembly that 9.6(master) and 9.2.4 generate for this? The NCG is in GHC since 9.2
I’ve also just tested with -fllvm, and that does indeed mitigate the bug, so the issue is definitely in the NCG. However, LLVM just optimizes the entirety of wtestF away, so the assembly it produces is not particularly informative:
A_wtestF_info$def:// %bb.0: // %nF6 ldr x8, [x20] and x22, x23, #0xff blr x8 ret
but this isn’t super meaningful, seeing as we’ve disabled the optimizer. Even just running opt -mem2reg is enough for llc -O1 to delete almost everything, and the output of llc -O0 isn’t very interesting.
-- Signed multiply/divideMO_Mulw->intOpTruew(\dxy->unitOL$MULdxy)MO_S_MulMayOflow->do_mul_may_oflowxyMO_S_Quotw->intOpTruew(\dxy->unitOL$SDIVdxy)
and
-- A (potentially signed) integer operation.-- In the case of 8- and 16-bit signed arithmetic we must first-- sign-extend both arguments to 32-bits.-- See Note [Signed arithmetic on AArch64].intOpis_signedwop=do-- compute x<m> <- x-- compute x<o> <- y-- <OP> x<n>, x<m>, x<o>(reg_x,format_x,code_x)<-getSomeRegx(reg_y,format_y,code_y)<-getSomeRegymassertPpr(isIntFormatformat_x&&isIntFormatformat_y)$text"intOp: non-int"-- This is the width of the registers on which the operation-- should be performed.letw'=opRegWidthwsignExtr|notis_signed=nilOL|otherwise=signExtendRegww'rreturn$Any(intFormatw)$\dst->code_x`appOL`code_y`appOL`-- sign-extend both operandssignExtreg_x`appOL`signExtreg_y`appOL`op(OpRegw'dst)(OpRegw'reg_x)(OpRegw'reg_y)`appOL`truncateRegw'wdst-- truncate back to the operand's original width
So, am I correct in assuming what's going on is that the value in a register being sign-extended because of the multiplication, and then re-used for the division after being mutated with no corrective truncation?
Can this be truncating reg_x and reg_y if the op is signed, or are they allowed to contain values of more than w bits?
-- Note [Signed arithmetic on AArch64]-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-- Handling signed arithmetic on sub-word-size values on AArch64 is a bit-- tricky as Cmm's type system does not capture signedness. While 32-bit values-- are fairly easy to handle due to AArch64's 32-bit instruction variants-- (denoted by use of %wN registers), 16- and 8-bit values require quite some-- care.---- We handle 16-and 8-bit values by using the 32-bit operations and-- sign-/zero-extending operands and truncate results as necessary. For-- simplicity we maintain the invariant that a register containing a-- sub-word-size value always contains the zero-extended form of that value-- in between operations.---- For instance, consider the program,---- test(bits64 buffer)-- bits8 a = bits8[buffer];-- bits8 b = %mul(a, 42);-- bits8 c = %not(b);-- bits8 d = %shrl(c, 4::bits8);-- return (d);-- }---- This program begins by loading `a` from memory, for which we use a-- zero-extended byte-size load. We next sign-extend `a` to 32-bits, and use a-- 32-bit multiplication to compute `b`, and truncate the result back down to-- 8-bits.---- Next we compute `c`: The `%not` requires no extension of its operands, but-- we must still truncate the result back down to 8-bits. Finally the `%shrl`-- requires no extension and no truncate since we can assume that-- `c` is zero-extended.---- TODO:-- Don't use Width in Operands-- Instructions should rather carry a RegWidth--
In 9.4 we started using sized primitives for arithmetic, in 9.2 we used Int# and truncated the result/argument. Which is likely why this doesn't happen in 9.2