compactFixupPointers# fails if compact region blocks are more than 2 GiB apart
Summary
Importing a compact region using importCompact
from the ghc-compact
package fails if the addresses of the compact region blocks span across more than 2 GiB.
Steps to reproduce
The smallest Haskell example for reproducing the bug I could make is the following:
import qualified Data.Compact as Compact
import qualified Data.Compact.Serialize as CompactSerialize
-- | Minimal test case for reproducing compactFixupPointers# bug for large compact regions.
main :: IO ()
main = do
let
large = 1024 * 1024 * 128
largeString = replicate large 'A'
region <- Compact.compact largeString
Compact.compactSize region >>= print
CompactSerialize.writeCompact "/tmp/bug.compact" region
deserialized <- CompactSerialize.unsafeReadCompact "/tmp/bug.compact"
case deserialized of
Left err -> putStrLn err
Right largeString' -> print (Compact.getCompact largeString' == largeString)
It uses the compact
library, but I could also reproduce it with a hand-rolled deserializer calling GHC.Compact.Serialized.importCompact
directly.
Compiling and running that file results in Failed to import compact
being printed to stdout.
I traced that error message (which comes from the compact
library) back to a call to compactFixupPointers#
in the ghc-compact
library.
I had a quick look at the corresponding source code of the RTS (in rts/sm/CNF.c
) and I suspect a faulty order predicate passed to qsort
in build_fixup_table
is the root cause of the issue. It is implemented as
static int
cmp_fixup_table_item (const void *e1, const void *e2)
{
const StgWord *w1 = e1;
const StgWord *w2 = e2;
return *w1 - *w2;
}
However, on a 64 bit system, sizeof(int)
is still just 4 bytes while the difference of two StgWord
s is 8 bytes in size.
If the two pointers being compared are too far apart, an overflow causes the returned order to be the reverse of the actual order.
The following self-contained C program demonstrates this behavior (if compiled on a 64 bit architecture):
#include <stdio.h>
#include <stdint.h>
typedef uint64_t StgWord;
static int
cmp_fixup_table_item (const void *e1, const void *e2)
{
const StgWord *w1 = e1;
const StgWord *w2 = e2;
return *w1 - *w2;
}
static void
check (const StgWord *e1, const StgWord *e2)
{
printf("*e1 = %ld *e2=%ld\n", *e1, *e2);
printf("*e1 < *e2 == %d\n", (int)(*e1 < *e2));
printf("cmp_fixup_table_item(e1,e2) == %d\n\n", cmp_fixup_table_item(e1, e2));
}
int main() {
StgWord examplePointer1 = 1234567;
StgWord examplePointer2 = examplePointer1 + 1234;
StgWord bigOffset = (1L << 31) + 1234L;
StgWord examplePointer3 = examplePointer1 + bigOffset;
check(&examplePointer1, &examplePointer2);
check(&examplePointer1, &examplePointer3);
return 0;
}
The output is
*e1 = 1234567 *e2=1235801
*e1 < *e2 == 1
cmp_fixup_table_item(e1,e2) == -1234
*e1 = 1234567 *e2=2148719449
*e1 < *e2 == 1
cmp_fixup_table_item(e1,e2) == 2147482414
Note how for pointers that are close to each other, it returns a negative number (as it should), but if the pointers are too far apart, the sign switches.
Expected behavior
The Haskell program above should print True
, not an error message.
Environment
- GHC version used: 8.6.3
Optional:
- Operating System: Ubuntu 18.04
- System Architecture: x86-64