Commit a9ce3611 authored by dmp's avatar dmp Committed by dterei
Browse files

Change stack alignment to 16+8 bytes in STG code

This patch changes the STG code so that %rsp to be aligned
to a 16-byte boundary + 8. This is the alignment required by
the x86_64 ABI on entry to a function. Previously we kept
%rsp aligned to a 16-byte boundary, but this was causing
problems for the LLVM backend (see #4211


We now don't need to invoke llvm stack mangler on
x86_64 targets. Since the stack is now 16+8 byte algined in
STG land on x86_64, we don't need to mangle the stack
manipulations with the llvm mangler.

This patch only modifies the alignement for x86_64 backends.
Signed-off-by: dterei's avatarDavid Terei <>
parent f0ae3f31
......@@ -143,11 +143,13 @@ fixTables ss = fixed
have been pushed, so sub 4). GHC though since it always uses jumps keeps
the stack 16 byte aligned on both function calls and function entry.
We correct the alignment here.
We correct the alignment here for Mac OS X i386. The x86_64 target already
has the correct alignment since we keep the stack 16+8 aligned throughout
STG land for 64-bit targets.
fixupStack :: B.ByteString -> B.ByteString -> B.ByteString
#if !darwin_TARGET_OS
#if !darwin_TARGET_OS || x86_64_TARGET_ARCH
fixupStack = const
......@@ -1842,15 +1842,17 @@ genCCall64 target dest_regs args =
tot_arg_size = arg_size * length stack_args
-- On entry to the called function, %rsp should be aligned
-- on a 16-byte boundary +8 (i.e. the first stack arg after
-- the return address is 16-byte aligned). In STG land
-- %rsp is kept 16-byte aligned (see StgCRun.c), so we just
-- need to make sure we push a multiple of 16-bytes of args,
-- plus the return address, to get the correct alignment.
-- on a 16-byte boundary +8 (i.e. the first stack arg
-- above the return address is 16-byte aligned). In STG
-- land %rsp is kept 8-byte aligned (see StgCRun.c), so we
-- just need to make sure we pad by eight bytes after
-- pushing a multiple of 16-bytes of args to get the
-- correct alignment. If we push an odd number of eight byte
-- arguments then no padding is needed.
-- Urg, this is hard. We need to feed the delta back into
-- the arg pushing code.
(real_size, adjust_rsp) <-
if tot_arg_size `rem` 16 == 0
if (tot_arg_size + 8) `rem` 16 == 0
then return (tot_arg_size, nilOL)
else do -- we need to adjust...
delta <- getDeltaNat
......@@ -1865,7 +1867,7 @@ genCCall64 target dest_regs args =
delta <- getDeltaNat
-- deal with static vs dynamic call targets
(callinsns,cconv) <-
(callinsns,_cconv) <-
case target of
CmmCallee (CmmLit (CmmLabel lbl)) conv
-> -- ToDo: stdcall arg sizes
......@@ -267,29 +267,36 @@ StgRunIsImplementedInAssembler(void)
"addq %0, %%rsp\n\t"
: : "i"(RESERVED_C_STACK_BYTES+48+8 /*stack frame size*/));
: : "i"(RESERVED_C_STACK_BYTES+48 /*stack frame size*/));
HACK alert!
The x86_64 ABI specifies that on a procedure call, %rsp is
The x86_64 ABI specifies that on entry to a procedure, %rsp is
aligned on a 16-byte boundary + 8. That is, the first
argument on the stack after the return address will be
16-byte aligned.
Which should be fine: RESERVED_C_STACK_BYTES+48 is a multiple
of 16 bytes.
16-byte aligned.
We maintain the 16+8 stack alignment throughout the STG code.
When we call STG_RUN the stack will be aligned to 16+8. We used
to subtract an extra 8 bytes so that %rsp would be 16 byte
aligned at all times in STG land. This worked fine for the
native code generator which knew that the stack was already
aligned on 16 bytes when it generated calls to C functions.
This arrangemnt caused problems for the LLVM backend. The LLVM
code generator would assume that on entry to each function the
stack is aligned to 16+8 as required by the ABI. However, since
we only enter STG functions by jumping to them with tail calls,
the stack was actually aligned to a 16-byte boundary. The LLVM
backend had its own mangler that would post-process the
assembly code to fixup the stack manipulation code to mainain
the correct alignment (see #4211).
Therefore, we now now keep the stack aligned to 16+8 while in
STG land so that LLVM generates correct code without any
mangling. The native code generator can handle this alignment
just fine by making sure the stack is aligned to a 16-byte
boundary before it makes a C-call.
BUT... when we do a C-call from STG land, gcc likes to put the
stack alignment adjustment in the prolog. eg. if we're calling
a function with arguments in regs, gcc will insert 'subq $8,%rsp'
in the prolog, to keep %rsp aligned (the return address is 8
bytes, remember). The mangler throws away the prolog, so we
lose the stack alignment.
The hack is to add this extra 8 bytes to our %rsp adjustment
here, so that throughout STG code, %rsp is 16-byte aligned,
ready for a C-call.
A quick way to see if this is wrong is to compile this code:
main = System.Exit.exitWith ExitSuccess
......@@ -300,7 +307,6 @@ StgRunIsImplementedInAssembler(void)
stack isn't aligned, and calling exitWith from Haskell invokes
shutdownHaskellAndExit using a C call.
Future gcc releases will almost certainly break this hack...
