Failure from pext on x86-64

changed the description

The problem appears to be the _ /s 0xffff::W16. I have reduced the testcase to:

λ> e = parseExpr @W16 "(((zext[W8→W16](0x12::W8 + (load[W8](0x17dd4::W64)))) & 0x59f8::W16) << 0xa::W64) /s 0xffff::W16"
λ> putStrLn $ showInterpretedExpr e
/s              SomeNumber @W16 0x8000
|
+- <<           SomeNumber @W16 0x8000
|  |
|  +- &         SomeNumber @W16 0xe0
|  |  |
|  |  +- zext[W8→W16]           SomeNumber @W16 0xe6
|  |  |  |
|  |  |  `- +           SomeNumber @W8 0xe6
|  |  |     |
|  |  |     +- 0x12::W8         SomeNumber @W8 0x12
|  |  |     |
|  |  |     `- load[W8]         SomeNumber @W8 0xd4
|  |  |        |
|  |  |        `- 0x17dd4::W64          SomeNumber @W64 0x17dd4
|  |  |
|  |  `- 0x59f8::W16            SomeNumber @W16 0x59f8
|  |
|  `- 0xa::W64          SomeNumber @W64 0xa
|
`- 0xffff::W16          SomeNumber @W16 0xffff

After further changes in test-primops> I have reproduce this with:

λ> e = parseExpr @W16 "(load[W16](0x0::W64) + 0x80
λ> quickCheck $ agree refInterpreter interpreter  e
*** Failed! Exception: 'ProcessFailure {preExitCode = -8}' (after 1 test):
%quot((bits16[buffer + (0 :: bits64)] + (-32768 :: bits16)), (-1 :: bits16))

So the problem here is that the division overflows a 16-bit word. That is, the bit pattern 0x8000 represents the signed value -32768. Naturally, -32768/-1 == 32768, which is not representable as a 16-bit integer (which can only represent [-32768, +32767]).

So the problem here is that the division overflows a 16-bit word.

To expand on this. I vaguely remember that this will silently overflow on some platforms (aarch64), and raise an exception on others (x86). (I verified the x86 behaviour only).

It seems sensible to define the result for signed division machops as undefined when it overflows, and adapt test-primops to avoid generating such cases.

Yes, signed integer divison has undefined behavior on overflow at the PrimOp and MachOp level, and this should be better-documented. See also discussion at ghc#24556.

I have looked into this. We already excluded overflowing signed division in the Arbitrary generator. However, it looks like I failed to include this same logic in the shrinker. I have fixed this in !29 (merged).

However, the fact that the shrinker was being used at all is suggestive that there may be a bug in our handling of pext.

Indeed the failures persist even after !29 (merged).

I guess we should reopen this ticket then?

mentioned in issue ghc#23836

mentioned in issue ghc#25800

assigned to @bgamari

mentioned in commit f265e488

mentioned in merge request !29 (merged)

mentioned in issue #14 (closed)

closed with merge request !29 (merged)

reopened

I suspect that the culprit here is actually the C implementation (libraries/ghc-internal/cbits/pext.c) used when BMI2 is not enabled as I cannot reproduce the issue with -mbmi2. It is not obvious what is wrong with it, however.

Collecting various failures:

(ret) = prim %pext32(%sx32((~bits8[buffer + (0 :: bits64)])), %sx32((~bits16[buffer + (10 :: bits64)])));
0x1fffffff /= 0xffffffff

(ret) = prim %pext32(%lobits32(((7849621336 :: bits64) & %neg(%zx64((~%zx32(%lobits8(bits32[buffer + (0 :: bits64)]))))))), %sx32(%neg(bits8[buffer + (15 :: bits64)])));
0 /= 0x40000000

(ret) = prim %pext32(%lobits32(%sx64(%neg(((3 :: bits16) & %lobits16(%zx64(((3 :: bits8) & %shra(bits8[buffer + (12 :: bits64)], (1 :: bits64))))))))), %lobits32(bits64[buffer + (2 :: bits64)]));
0 /= 1

Seems like pext32 is the culprit. Haven't yet identified why.

I'm new to this project, so I may be saying something silly...

The type for %pext32 is (bits64, bits64) -> bits64 on 64-bit platforms, right? But, if I understand correctly, the test suite passes (bits32, bits32) to %pext32. If I run ./run.sh --ghc-args -dcmm-lint, I get many Cmm lint errors.

I think this problem can be solved by one of the following ways:

Change the test to pass (bits64, bits64) on 64-bit platforms.
Change the type of %pext32 to (bits32, bits32) -> bits32. (The corresponding Haskell function would be Word32# -> Word32# -> Word32#)
Change the implementation of hs_pext32 to return hs_pext64(src, (StgWord32)mask);.

The issue doesn't seem to be x86-64 specific. Failure on AArch64 https://gitlab.haskell.org/ghc/test-primops/-/jobs/2174111#L196:

    pext
      W8:                 FAIL (138.74s)
        *** Failed! Falsified (after 96 tests and 29 shrinks):
        (1::W8,1::W8 & (((narrow[W16→W8](1::W16 /s (~(zext[W8→W16](0::W8 /s (narrow[W32→W8](load[W32](0xe::W64)))))))) >>a 1::W64) %u 0x3d::W8))
        test ( bits64 buffer ) {
          bits8 ret;
          (ret) = prim %pext8((1 :: bits8), ((1 :: bits8) & %modu(%shra(%lobits8(%quot((1 :: bits16), (~%zx16(%quot((0 :: bits8), %lobits8(bits32[buffer + (14 :: bits64)])))))), (1 :: bits64)), (61 :: bits8))));
          return (ret);
        }
        
        1 /= 0
        Use --quickcheck-replay="(SMGen 7716716293969237036 4258734693017717909,95)" to reproduce.
        Use -p '($2!~/llvm/&&$0!="test primops/expression correctness")&&/pext.W8/' to rerun this test only.
      W16:                FAIL (52.78s)
        *** Failed! Falsified (after 9 tests and 7 shrinks):
        (zext[W8→W16](0xfe::W8 %u ((0xf3::W8 %u ((load[W8](0x9c::W64)) | 0x7f::W8)) >>a 1::W64)),1::W16)
        test ( bits64 buffer ) {
          bits16 ret;
          (ret) = prim %pext16(%zx16(%modu((-2 :: bits8), %shra(%modu((-13 :: bits8), (bits8[buffer + (156 :: bits64)] | (127 :: bits8))), (1 :: bits64)))), (1 :: bits16));
          return (ret);
        }
        
        1 /= 0
        Use --quickcheck-replay="(SMGen 5847628808866959348 9875815302680720887,8)" to reproduce.
        Use -p '($2!~/llvm/&&$0!="test primops/expression correctness")&&/pext.W16/' to rerun this test only.
      W32:                OK (23.99s)
        +++ OK, passed 100 tests.
      W64:                FAIL (49.48s)
        *** Failed! Falsified (after 62 tests and 8 shrinks):
        (zext[W8→W64](0xff::W8 %u ((narrow[W16→W8](load[W16](0xc1::W64))) >>a 1::W64)),5::W64)
        test ( bits64 buffer ) {
          bits64 ret;
          (ret) = prim %pext64(%zx64(%modu((-1 :: bits8), %shra(%lobits8(bits16[buffer + (193 :: bits64)]), (1 :: bits64)))), (5 :: bits64));
          return (ret);
        }
        
        1 /= 3
        Use --quickcheck-replay="(SMGen 15677446015516347944 3934829377554291769,61)" to reproduce.
        Use -p '($2!~/llvm/&&$0!="test primops/expression correctness")&&/pext.W64/' to rerun this test only.

It could be an issue with the test itself as pointed out by @aratamizuki above.

mentioned in merge request ghc!13967 (closed)

Failure from pext on x86-64

`test.cmm`

Designs

Child items ...

Activity