Draft: Windows: Switch to clang-based toolchain, finishing ASLR effort
Recently msys2 started enabling ASLR support in its toolchain, uncovering a
number of bugs in GHC's code generation strategy on Windows as well as bugs in
upstream toolchain components (e.g. ld.bfd
, see binutils #26757). This has
rendered GHC completely unusable with recent msys2 toolchains (see #16780 (closed)).
This set of branches fixes GHC's linker, code generator, and native toolchain to support allow GHC to be used with recent Clang/lld and high image base addresses, as needed by Windows with ASLR enabled.
Thanks to Tamar Christina (@Phyx) for his help and seemingly-boundless Windows knowledge.
Why Clang?
Given that enabling ASLR is already quite a significant undertaking, I was originally reluctant to simultaneously change the toolchain as well.
Unfortunately the move away from gcc
to clang
was necessitated by the
currently rather messy state of gcc
and binutils
on Windows. Specifically,
ld.bfd
produces broken output in links containing weak symbols when the image
base is above 4GB (see binutils bug #26757). My initial thought was to work
around this by moving to lld
for linking but continuing to use gcc
for
compilation and gas
for assembly. Unfortunately, this is precluded by a gas
bug (see GHC #9907 (closed)) which causes it to produce ill-formed PE objects.
Consequently, we are are forced to move change our entire native toolchain. This, of course, comes with the potential for us to uncover new bugs in the new toolchain. On the other hand, it also has a number of up-sides:
- lld is considerably faster than
ld.bfd
, particularly on Windows (closing #16084 (closed)) - we no longer need the rather clever-but-hacky binary patching logic that we previously relied on to allow gcc to be long-file-path-aware. We previously considered switching to LLVM for this reason alone #17777 (closed))
- using LLVM end-to-end means that we avoid issues like #16354 (closed), where GHC's
LLVM backend would produce assembly (via
opt
) that thegas
assembler is unable to parse.
Branch structure
This project required changes in a number of GHC subsystems. I've tried localize these in individual merge requests to aid in review:
-
!7445 (closed) ( wip/windows-test-fixes
) fixes a few latent testsuite issues which I turned up while testing -
!7526 (closed) ( wip/T21059
) fixes a long-path awareness bug in the linker -
!7512 (closed) ( wip/factor-out-alloc
) refactors the linker's memory mapping functionality -
!7511 (closed) ( wip/adjustor-pool
) reworks the adjustor allocation strategy to eliminate out-of-memory issues inT10296a
and fix #20349 (closed) -
!7528 (closed) ( wip/no-c-stubs
) refactors the logic responsible for generating C stubs, allowing us to safely usear
archives instead of object merging whichlld
does not support. -
!7805 (closed) ( wip/T21253
) fixes a preexisting bug (#21253 (closed)) in the linker's handling of cyclic symbol dependencies triggered by our new treatment of C stubs -
!7804 (closed) ( wip/T21254
) fixes a pair of GC bugs (#21254 (closed)) that I encountered when testing these changes. -
!7547 (closed) ( wip/object-merging-via-archives
) refactors the driver and linker to allow static archives to be used in place of object merging, whichlld
does not support. -
!7446 (closed) ( wip/m32-fixes
) generalizes the m32 allocator to rather allocate pages near the program image rather than in low address. This is necessary as Windows with ASLR enabled puts the image in high memory -
!7447 (merged) ( wip/windows-high-linker
) reworks the PEi386 linker, carrying out a number of tasks:- adding compatibility with high-memory images by reworking the
memory allocation scheme to use m32 instead of
HeapAlloc
- adding compatible with the GC-based code unloading logic adding in c34a4b98
- fixing numerous latent bugs
This depends upon
wip/m32-fixes
.
- adding compatibility with high-memory images by reworking the
memory allocation scheme to use m32 instead of
-
!7449 (merged) ( wip/windows-high-codegen
) fixes the code generator to produce code which can be loaded in high memory, as required by high-entropy ASLR on Windows. -
Cabal #8062, disabling GHCi object support on Windows -
!7867 (closed) takes care of a few prerequisites for bumping the text
submodule, which will be necessary to support the new toolchain -
!7448 (closed) (this MR, wip/windows-clang-2
) moves away fromgcc
/binutils
as our GHC toolchain toclang
/lld
. This depends upon everything above. -
!7891 (closed) ( wip/T18826
) fixes a somewhat-orthogonal and largely harmless packaging issue (#18826 (closed)), allowing us to drop quite a bit of code.
!7480 (merged) is a roll-up merging this and all of the above branches for CI testing.
To do
-
Sort out how to handle lack of ld.lld -r
-
Sort out what to do about text
's C++ dependency -
Package compiler-rt
-
Reintroduce LoadLibrary("msvcrt")
in linker -
Make ar
"merging" support conditional on platform -
Evaluate whether we need wrappers for clang similar to what we have for gcc
Tickets
- Closes #21019 (closed).
- Closes #16780 (closed) since GHC can be built with normal
clang
andlld
packaged by msys2. - Closes #16084 (closed).
- Closes #16354 (closed).
- Closes #20967 (closed).
- Closes #15670 (closed).
- Closes #12714 (closed).
- Closes #18721 (closed).