Commit 620fc6f9 authored by Tamar Christina's avatar Tamar Christina Committed by thomie

Make Windows linker more robust to unknown sections

The Windows Linker has 3 main parts that this patch changes.

1) Identification and classification of sections
2) Adding of symbols to the symbols tables
3) Reallocation of sections

1.
Previously section identification used to be done on a whitelisted
basis. It was also exclusively being done based on the names of the
sections. This meant that there was a bit of a cat and mouse game
between `GCC` and `GHC`. Every time `GCC` added new sections there was a
good chance `GHC` would break. Luckily this hasn't happened much in the
past because the `GCC` versions `GHC` used were largely unchanged.

The new code instead treats all new section as `CODE` or `DATA`
sections, and changes the classifications based on the `Characteristics`
flag in the PE header. By doing so we no longer have the fragility of
changing section names. The one exception to this is the `.ctors`
section, which has no differentiating flag in the PE header, but we know
we need to treat it as initialization data.

The check to see if the sections are aligned by `4` has been removed.
The reason is that debug sections often time are `1 aligned` but do have
relocation symbols. In order to support relocations of `.debug` sections
this check needs to be gone. Crucially this assumption doesn't seem to
be in the rest of the code. We only check if there are at least 4 bytes
to realign further down the road.

2.
The second loop is iterating of all the symbols in the file and trying
to add them to the symbols table. Because the classification of the
sections we did previously are (currently) not available in this phase
we still have to exclude the sections by hand. If they don't we will
load in symbols from sections we've explicitly ignored the in # 1. This
whole part should rewritten to avoid this. But didn't want to do it in
this commit.

3.
Finally the sections are relocated. But for some reason the PE files
contain a Linux relocation constant in them `0x0011` This constant as
far as I can tell does not come from GHC (or I couldn't find where it's
being set). I believe this is probably a bug in GAS. But because the
constant is in the output we have to handle it. I am thus mapping it to
the constant I think it should be `0x0003`.

Finally, static linking *should* work, but won't. At least not if you
want to statically link `libgcc` with exceptions support. Doing so would
require you to link `libgcc` and `libstd++` but also `libmingwex`. The
problem is that `libmingwex` also defines a lot of symbols that the RTS
automatically injects into the symbol table. Presumably because they're
symbols that it needs. like `coshf`. The these symbols are not in a
section that is declared with weak symbols support. So if we ever want
to get this working, we should either a) Ask mingw to declare the
section as such, or b) treat all a imported symbols as being weak.
Though this doesn't seem like it's a good idea..

Test Plan:
Running ./validate for both x86 and x86_64

Also running the specific test case for #10672

make TESTS="T10672_x86 T10672_x64"

Reviewed By: ezyang, thomie, austin

Differential Revision: https://phabricator.haskell.org/D1244

GHC Trac Issues: #9907, #10672, #10563
parent d2fb5328
......@@ -3631,21 +3631,17 @@ ocFlushInstructionCache( ObjectCode *oc )
/* --------------------------------------------------------------------------
* PEi386 specifics (Win32 targets)
* PEi386(+) specifics (Win32 targets)
* ------------------------------------------------------------------------*/
/* The information for this linker comes from
Microsoft Portable Executable
and Common Object File Format Specification
revision 5.1 January 1998
which SimonM says comes from the MS Developer Network CDs.
revision 8.3 February 2013
It can be found there (on older CDs), but can also be found
online at:
It can be found online at:
http://www.microsoft.com/hwdev/hardware/PECOFF.asp
(this is Rev 6.0 from February 1999).
https://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx
Things move, so if that fails, try searching for it via
......@@ -3666,6 +3662,17 @@ ocFlushInstructionCache( ObjectCode *oc )
John Levine's book "Linkers and Loaders" contains useful
info on PE too.
The PE specification doesn't specify how to do the actual
relocations. For this reason, and because both PE and ELF are
based on COFF, the relocations for the PEi386+ code is based on
the ELF relocations for the equivalent relocation type.
The ELF ABI can be found at
http://www.x86-64.org/documentation/abi.pdf
The current code is based on version 0.99.6 - October 2013
*/
......@@ -3755,11 +3762,14 @@ typedef
#define MYIMAGE_SYM_CLASS_STATIC 3
#define MYIMAGE_SYM_UNDEFINED 0
/* From PE spec doc, section 4.1 */
/* From PE spec doc, section 3.1 */
#define MYIMAGE_SCN_CNT_CODE 0x00000020
#define MYIMAGE_SCN_CNT_INITIALIZED_DATA 0x00000040
#define MYIMAGE_SCN_CNT_UNINITIALIZED_DATA 0x00000080
#define MYIMAGE_SCN_LNK_COMDAT 0x00001000
#define MYIMAGE_SCN_LNK_NRELOC_OVFL 0x01000000
#define MYIMAGE_SCN_LNK_REMOVE 0x00000800
#define MYIMAGE_SCN_MEM_DISCARDABLE 0x02000000
/* From PE spec doc, section 5.2.1 */
#define MYIMAGE_REL_I386_DIR32 0x0006
......@@ -4357,8 +4367,9 @@ ocGetNames_PEi386 ( ObjectCode* oc )
UChar* end;
UInt32 sz;
/* By default consider all section as CODE or DATA, which means we want to load them. */
SectionKind kind
= SECTIONKIND_OTHER;
= SECTIONKIND_CODE_OR_RODATA;
COFF_section* sectab_i
= (COFF_section*)
myindex ( sizeof_COFF_section, sectab, i );
......@@ -4367,34 +4378,20 @@ ocGetNames_PEi386 ( ObjectCode* oc )
IF_DEBUG(linker, debugBelch("section name = %s\n", secname ));
# if 0
/* I'm sure this is the Right Way to do it. However, the
alternative of testing the sectab_i->Name field seems to
work ok with Cygwin.
EZY: We should strongly consider using this style, because
it lets us pick up sections that should be added (e.g.
for a while the linker did not work due to missing .eh_frame
in this section.)
*/
/* The PE file section flag indicates whether the section contains code or data. */
if (sectab_i->Characteristics & MYIMAGE_SCN_CNT_CODE ||
sectab_i->Characteristics & MYIMAGE_SCN_CNT_INITIALIZED_DATA)
kind = SECTIONKIND_CODE_OR_RODATA;
# endif
if (0==strcmp(".text",(char*)secname) ||
0==strcmp(".text.startup",(char*)secname) ||
0==strcmp(".text.unlikely", (char*)secname) ||
0==strncmp(".text$",(char*)secname, 6) ||
/* See Note [.rdata section group] */
(0==strncmp(".rdata",(char*)secname, 6) &&
0!=strcmp(".rdata$zzz", (char*)secname)) ||
0==strcmp(".eh_frame", (char*)secname) ||
0==strcmp(".rodata",(char*)secname))
kind = SECTIONKIND_CODE_OR_RODATA;
if (0==strcmp(".data",(char*)secname) ||
0==strcmp(".bss",(char*)secname))
/* Check next if it contains any uninitialized data */
if (sectab_i->Characteristics & MYIMAGE_SCN_CNT_UNINITIALIZED_DATA)
kind = SECTIONKIND_RWDATA;
/* Finally check if it can be discarded. This will also ignore .debug sections */
if (sectab_i->Characteristics & MYIMAGE_SCN_MEM_DISCARDABLE ||
sectab_i->Characteristics & MYIMAGE_SCN_LNK_REMOVE)
kind = SECTIONKIND_OTHER;
if (0==strcmp(".ctors", (char*)secname))
kind = SECTIONKIND_INIT_ARRAY;
......@@ -4405,33 +4402,7 @@ ocGetNames_PEi386 ( ObjectCode* oc )
start = ((UChar*)(oc->image)) + sectab_i->PointerToRawData;
end = start + sz - 1;
if (kind == SECTIONKIND_OTHER
/* Ignore sections called which contain stabs debugging
information. */
&& 0 != strcmp(".stab", (char*)secname)
&& 0 != strcmp(".stabstr", (char*)secname)
/* Ignore sections called which contain exception information. */
&& 0 != strncmp(".pdata", (char*)secname, 6)
&& 0 != strncmp(".xdata", (char*)secname, 6)
/* ignore section generated from .ident */
&& 0!= strncmp(".debug", (char*)secname, 6)
/* ignore unknown section that appeared in gcc 3.4.5(?) */
&& 0!= strcmp(".reloc", (char*)secname)
/* See Note [.rdata section group] */
&& 0 != strcmp(".rdata$zzz", (char*)secname)
/* ignore linker directive sections */
&& 0 != strcmp(".drectve", (char*)secname)
) {
IF_DEBUG(linker, debugBelch("Unknown PEi386 section name `%s' (while processing: %" PATH_FMT")", secname, oc->fileName));
}
if (kind != SECTIONKIND_OTHER && end >= start) {
if ((((size_t)(start)) % 4) != 0) {
errorBelch("Misaligned section %s: %p", (char*)secname, start);
stgFree(secname);
return 0;
}
addSection(oc, kind, start, end);
addProddableBlock(oc, start, end - start + 1);
}
......@@ -4439,24 +4410,6 @@ ocGetNames_PEi386 ( ObjectCode* oc )
stgFree(secname);
}
/*
Note [.rdata section group]
Most of the sections .rdata group section we want to load
and consider a SECTIONKIND_CODE_OR_RODATA section.
With the exception of .rdata$zzz which is just a section
containing the GCC version:
Contents of section .rdata$zzz:
0000 4743433a 20285265 76332c20 4275696c GCC: (Rev3, Buil
0010 74206279 204d5359 53322070 726f6a65 t by MSYS2 proje
0020 63742920 352e322e 30000000 00000000 ct) 5.2.0.......
Because we're inspecting the group members one by one, we shouldn't
consider this code since we can't load it. Instead consider it an OTHER
section.
*/
/* Copy exported symbols into the ObjectCode. */
oc->n_symbols = hdr->NumberOfSymbols;
......@@ -4466,10 +4419,8 @@ ocGetNames_PEi386 ( ObjectCode* oc )
for (i = 0; i < oc->n_symbols; i++)
oc->symbols[i] = NULL;
i = 0;
while (1) {
for (i = 0; i < oc->n_symbols; i++) {
COFF_symbol* symtab_i;
if (i >= (Int32)(hdr->NumberOfSymbols)) break;
symtab_i = (COFF_symbol*)
myindex ( sizeof_COFF_symbol, symtab, i );
......@@ -4548,7 +4499,6 @@ ocGetNames_PEi386 ( ObjectCode* oc )
}
i += symtab_i->NumberOfAuxSymbols;
i++;
}
return 1;
......@@ -4639,9 +4589,8 @@ ocResolve_PEi386 ( ObjectCode* oc )
char *secname = cstring_from_section_name(sectab_i->Name, strtab);
/* Ignore sections called which contain stabs debugging
information. */
if (0 == strcmp(".stab", (char*)secname)
/* Ignore sections called which contain stabs debugging information. */
if ( 0 == strcmp(".stab", (char*)secname)
|| 0 == strcmp(".stabstr", (char*)secname)
|| 0 == strncmp(".pdata", (char*)secname, 6)
|| 0 == strncmp(".xdata", (char*)secname, 6)
......@@ -4766,8 +4715,17 @@ ocResolve_PEi386 ( ObjectCode* oc )
*(UInt32 *)pP = ((UInt32)S) + A - ((UInt32)(size_t)pP) - 4;
break;
#elif defined(x86_64_HOST_ARCH)
case 2: /* R_X86_64_32 */
case 17: /* R_X86_64_32S */
case 1: /* R_X86_64_64 (ELF constant 1) - IMAGE_REL_AMD64_ADDR64 (PE constant 1) */
{
UInt64 A;
checkProddableBlock(oc, pP, 8);
A = *(UInt64*)pP;
*(UInt64 *)pP = ((UInt64)S) + ((UInt64)A);
break;
}
case 2: /* R_X86_64_32 (ELF constant 10) - IMAGE_REL_AMD64_ADDR32 (PE constant 2) */
case 3: /* R_X86_64_32S (ELF constant 11) - IMAGE_REL_AMD64_ADDR32NB (PE constant 3) */
case 17: /* R_X86_64_32S ELF constant, no PE mapping. See note [ELF constant in PE file] */
{
size_t v;
v = S + ((size_t)A);
......@@ -4777,14 +4735,14 @@ ocResolve_PEi386 ( ObjectCode* oc )
/* And retry */
v = S + ((size_t)A);
if (v >> 32) {
barf("R_X86_64_32[S]: High bits are set in %zx for %s",
barf("IMAGE_REL_AMD64_ADDR32[NB]: High bits are set in %zx for %s",
v, (char *)symbol);
}
}
*(UInt32 *)pP = (UInt32)v;
break;
}
case 4: /* R_X86_64_PC32 */
case 4: /* R_X86_64_PC32 (ELF constant 2) - IMAGE_REL_AMD64_REL32 (PE constant 4) */
{
intptr_t v;
v = ((intptr_t)S) + ((intptr_t)(Int32)A) - ((intptr_t)pP) - 4;
......@@ -4795,21 +4753,13 @@ ocResolve_PEi386 ( ObjectCode* oc )
/* And retry */
v = ((intptr_t)S) + ((intptr_t)(Int32)A) - ((intptr_t)pP) - 4;
if ((v >> 32) && ((-v) >> 32)) {
barf("R_X86_64_PC32: High bits are set in %zx for %s",
barf("IMAGE_REL_AMD64_REL32: High bits are set in %zx for %s",
v, (char *)symbol);
}
}
*(UInt32 *)pP = (UInt32)v;
break;
}
case 1: /* R_X86_64_64 */
{
UInt64 A;
checkProddableBlock(oc, pP, 8);
A = *(UInt64*)pP;
*(UInt64 *)pP = ((UInt64)S) + ((UInt64)A);
break;
}
#endif
default:
debugBelch("%" PATH_FMT ": unhandled PEi386 relocation type %d",
......@@ -4824,6 +4774,21 @@ ocResolve_PEi386 ( ObjectCode* oc )
return 1;
}
/*
Note [ELF constant in PE file]
For some reason, the PE files produced by GHC contain a linux
relocation constant 17 (0x11) in the object files. As far as I (Phyx-) can tell
this constant doesn't seem like it's coming from GHC, or at least I could not find
anything in the .s output that GHC produces which specifies the relocation type.
This leads me to believe that this is a bug in GAS. However because this constant is
there we must deal with it. This is done by mapping it to the equivalent in behaviour PE
relocation constant 0x03.
See #9907
*/
static int
ocRunInit_PEi386 ( ObjectCode *oc )
{
......
......@@ -210,6 +210,9 @@ from testlib import *
if windows or darwin:
pkginfo = getStdout([config.ghc_pkg, 'dump'])
topdir = config.libdir
if windows:
mingw = os.path.join(topdir, '../mingw/bin')
os.environ['PATH'] = os.pathsep.join([os.environ.get("PATH", ""), mingw])
for line in pkginfo.split('\n'):
if line.startswith('library-dirs:'):
path = line.rstrip()
......
-- Copyright (C) 2015, Luke Iannini
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import Printf ( pr )
foreign import ccall "talkToCxx" talkToCxx :: IO ()
main :: IO ()
main = do
putStrLn ( $(pr "Hello From Template Haskell!") )
talkToCxx
TOP=../../..
include $(TOP)/mk/boilerplate.mk
include $(TOP)/mk/test.mk
T10672_x64:
'$(TEST_HC)' $(TEST_HC_OPTS) -v0 -rtsopts=none -fforce-recomp -lgcc_s_seh-1 -lstdc++-6 \
Main.hs Printf.hs cxxy.cpp
T10672_x86:
'$(TEST_HC)' $(TEST_HC_OPTS) -v0 -rtsopts=none -fforce-recomp -lgcc_s_dw2-1 -lstdc++-6 \
Main.hs Printf.hs cxxy.cpp
-- Copyright (C) 2015, Luke Iannini
{-# LANGUAGE TemplateHaskell #-}
module Printf where
-- Skeletal printf from the paper:
-- http://research.microsoft.com/pubs/67015/meta-haskell.pdf
-- It needs to be in a separate module to the one where
-- you intend to use it.
-- Import some Template Haskell syntax
import Language.Haskell.TH
-- Describe a format string
data Format = D | S | L String
-- Parse a format string. This is left largely to you
-- as we are here interested in building our first ever
-- Template Haskell program and not in building printf.
parse :: String -> [Format]
parse s = [ L s ]
-- Generate Haskell source code from a parsed representation
-- of the format string. This code will be spliced into
-- the module which calls "pr", at compile time.
gen :: [Format] -> Q Exp
gen [D] = [| \n -> show n |]
gen [S] = [| \s -> s |]
gen [L s] = stringE s
-- Here we generate the Haskell code for the splice
-- from an input format string.
pr :: String -> Q Exp
pr s = gen (parse s)
test('T10672_x64', [extra_clean(['cxxy.o',
'Main.exe', 'Main.hi', 'Main.o',
'Printf.o', 'Printf.hi']),
[unless(opsys('mingw32'),skip) , unless(arch('x86_64'), skip)]],
run_command, ['$MAKE -s --no-print-directory T10672_x64'])
test('T10672_x86', [extra_clean(['cxxy.o',
'Main.exe', 'Main.hi', 'Main.o',
'Printf.o', 'Printf.hi']),
[unless(opsys('mingw32'),skip) , unless(arch('i386'), skip)]],
run_command, ['$MAKE -s --no-print-directory T10672_x86'])
// Copyright (C) 2015, Luke Iannini
#include <iostream>
#include <exception>
#include <string.h>
// Make sure can call unmangled names from Haskell's FFI
extern "C" {
int talkToCxx() {
try {
throw 20;
}
catch (int e) {
std::cout << "An exception occurred. Exception Nr. " << e << '\n';
}
std::cout << "Hello From C++!";
}
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment