128-bit Callee-saved registers (Windows x64) are not saved and restored correctly
Summary
The Windows x64 calling convention requires that the 128-bit registers XMM6-XMM15 be saved (and restored) by the callee.
StgRunIsImplementedInAssembler
, which handles this in the runtime, does not correctly save and restore these registers, as it uses movq
to do so, treating them as 64-bits only.
Consequently, when the registers are restored, the high-order 64-bits are zeroed.
Instead, movaps
should probably be used, and 16 bytes reserved for each XMM register.
The consequence of this is that FFI calls to Haskell code on Windows may violate the calling convention and lead to undefined behaviour.
Steps to reproduce
The attached files can be compiled on Windows with ghc Main.hs reg.c
.
The Haskell main function invokes a C function test_c
, which uses inline assembly to set a value to the XMM6 register, invoke a Haskell function helper
, and then read out the resulting value of the XMM6 register, comparing each byte to the expected (original) value.
The output of executing Main.exe
is:
This is the helper function
0: 01 01
1: 02 02
2: 03 03
3: 04 04
4: 05 05
5: 06 06
6: 07 07
7: 08 08
8: 09 00
9: 0a 00
10: 0b 00
11: 0c 00
12: 0d 00
13: 0e 00
14: 0f 00
15: 10 00
Done.
This indicates that the lower-order 8 bytes of the register are correctly restored after the call to helper
, but the high-order 8 bytes are not.
This is consistent with the use of movq
instead of movaps
(or movups
).
Expected behavior
The expected output is:
This is the helper function
0: 01 01
1: 02 02
2: 03 03
3: 04 04
4: 05 05
5: 06 06
6: 07 07
7: 08 08
8: 09 09
9: 0a 0a
10: 0b 0b
11: 0c 0c
12: 0d 0d
13: 0e 0e
14: 0f 0f
15: 10 10
Done.
Environment
- GHC version used: 9.0.2, 8.10.2
Optional:
- Operating System: Windows 8.1
- System Architecture: x86-64