Bring back -fast-llvm / drop the %function -> %object mangling
We dropped -fast-llvm, due to the assumption that it could not generate working code. One of the assumptions is that the mangler is absolutely necessary to generate proper code. We need to rewrite %function
s to %object
s in the assembly to prevent the function symbols to be resolved through the PLT. Modern compilers provide -fno-plt
for this purpose. As such we can drop the %function
s to %object
s mangler phase if we ensure to compile haskell code always with -fno-plt
.
This also means we can drop the mangler for non-avx/elf. As the mangler will only mangle avx, and (-dead_strip) support for mach-o. And as such -fast-llvm
with -fno-plt
is safe on arm/aarch64 on elf.
For a slightly more elaborate discussion of the PLT issue, see the comment below. We'd want to rather use -fno-plt
than the mangler as it provides the information at the right level, the code generator rather than at the lowest (assembly) level. It also guides the compiler into the right direction instead of trying to patch up the mess it made.
So for the PLT issue I can kind of see how this is a problem. And this is only due to haskell tables next to code feature. For LLVM we use the prefix-data capability.
Thus code ends up being laid out like this
| Info table
-> | function symbol
| function body
Now the issue happens when we have a reference in the body of another function. Say we have two haskell functions, A
and B
.
| Info table (A)
| A: function symbol
| function body (A)
--- other module
| Info table (B)
| B: function symbol
| function body (B)
| <read some data at A-n (0 < n < (size of info table))>
We'd expect the following linking:
| Info table (A)
.->| A: function symbol
| | function body (A)
| -- other module
| | Info table (B)
| | B: function symbol
| | function body (B)
| | <read some data at A-n (0 < n < (size of info table))>
'----------------------'
Thus A-n
points into into the Info Table (A).
With PLT however we end up with:
| Procedure Linking Table (PLT)
.->|<stub for A> --------.
| -- |
| | Info table (A) |
| | A: function symbol <-'
| | function body (A)
| -- other module
| | Info table (B)
| | B: function symbol
| | function body (B)
| | <read some data at A-n (0 < n < (size of info table))>
'-----------------------'
And now A-n
points into random memory, not an info table.
if the reference for A-n
was instead an object and we'd reference it as such,
we'd have to load the object address into place, which in turn through the Global Offset Table (GOT) would require us to write code in the reference location for A-n
that has to load the address from the GOT. For functions we can use the PLT and the fact that chain calling will simply work for functions.
Example in pseudo code:
fn g {
push args
call f
}
is functionally identical to
fn f' { call f }
fn g {
push args
call f'
}
when going through the GOT however, we can't use the "just jump to f" approach.
fn g {
f' = load addr of f
push args
call f'
}
would be
fn g {
got = load addr of got
f' = lookup addr of f in got
push args
call f'
}
in this construction f' would always point to the true memory location of f
. Not to a jump location like in the PLT case.