This merge request addresses issues #26109 (closed) and #20645 (closed).
What a doozy these tickets were. Let's go through each failure from #26109 (closed) separately, since they formed classes with different causes:
pdep8#, pdep16#, pext8#, and pext16#
These types need to be treated as (at least) being an i32 when handled by LLVM and then subsequently truncated (by LLVM) to the correct, smaller bit-width type.
The calls to "hs_pdep8" and "hs_pdep16" must instead be replaced by calls to "hs_pdep32".
This corrects the interchange between GHC's internal usage of only i64 and i32 types with the LLVM's smaller bit-width types.
bitReverse8#, bitReverse16#, bitReverse32#
The incorrect results were cased by an erroneous sign extension when casting between i64 and lower bit-widths.
If the least significant bit of the input is set, then after a "Bit-Reverse" operation the most significant bit will be set.
i8 : 0x03 = 00000011 = <Input>
i8 : 0xC0 = 11000000 = Bit-Reverse(0x03)
When casting the result to a larger bit-with to promote a smaller bit-width value into a larger bit-width type, a decision must be made between whether to "Zero-Extend" or "Sign-Extend."
Deciding to "Sign-Extend" (which is what was the previous behavior), in conjunction with having the most significant bit set after the "Bit-Reverse" (the least significant bit set in input) yields a larger bit-width value with 1s in all the bits of greater significance than the most significant bit of the "Bit-Reverse."
Now if an extension is performed after the operation, it is generally expected that the bit-wise operations like "Bit-Reverse" will not be treated /numerically/ as signed.
To illustrate the difference, consider how a signed extension for the type i16 to i32 differs for out values above:
i16 : 0x00C0 = 0000000011000000 = Extend-Zeroed(i16, Bit-Reverse(0x03))
i16 : 0xFFC0 = 1111111111000000 = Extend-Signed(i16, Bit-Reverse(0x03))
Here we can see that the former output (0x00C0) is the expected result of a bit-wise operation which needs to be promoted to a larger bit-width type.
The latter output (0xFFC0) is not desirable when we must constraining a value into a range of i8 within an i16 type.
Hence GHC must always treat the "signage" as unsigned for "Bit-Reverse.""
byteSwap16# and byteSwap32#
With regards to byteSwap16# and byteSwap32#, they were effected by two defects.
-
The first defect was erroneous sign extension, the same as the
bitReverse*#operations and was fixed in exactly the same way as described above. -
The second "defect" was more insidious. There is a mismatch between the specification and the test case(s). The specification within
GHC.Internal.Primstates:{-|Swap bytes in the lower 16 bits of a word. The higher bytes are undefined. -} byteSwap16# :: Word# -> Word#Note the statement:
The higher bytes are undefined.
Unfortunately, the test case specifies an expectation for the upper bits! If the upper bits are indeed ignored (as anything could be in those upper bits), then existing implementation was correct (after accounting for the sign extension issue). The solution here is a more permissive test case which does not consider the bits within the undefined region of the register.