Bring back -fast-llvm / drop the %function -> %object mangling

We dropped -fast-llvm, due to the assumption that it could not generate working code. One of the assumptions is that the mangler is absolutely necessary to generate proper code. We need to rewrite %functions to %objects in the assembly to prevent the function symbols to be resolved through the PLT. Modern compilers provide -fno-plt for this purpose. As such we can drop the %functions to %objects mangler phase if we ensure to compile haskell code always with -fno-plt.

This also means we can drop the mangler for non-avx/elf. As the mangler will only mangle avx, and (-dead_strip) support for mach-o. And as such -fast-llvm with -fno-plt is safe on arm/aarch64 on elf.

For a slightly more elaborate discussion of the PLT issue, see the comment below. We'd want to rather use -fno-plt than the mangler as it provides the information at the right level, the code generator rather than at the lowest (assembly) level. It also guides the compiler into the right direction instead of trying to patch up the mess it made.

So for the PLT issue I can kind of see how this is a problem. And this is only due to haskell tables next to code feature. For LLVM we use the prefix-data capability.

Thus code ends up being laid out like this

   | Info table
-> | function symbol
   | function body

Now the issue happens when we have a reference in the body of another function. Say we have two haskell functions, A and B.

  | Info table (A)
  | A: function symbol
  | function body (A)
--- other module
  | Info table (B)
  | B: function symbol
  | function body (B)
  | <read some data at A-n (0 < n < (size of info table))>

We'd expect the following linking:

   | Info table (A)
.->| A: function symbol
|  | function body (A)
| -- other module
|  | Info table (B)
|  | B: function symbol
|  | function body (B)
|  | <read some data at A-n (0 < n < (size of info table))>
'----------------------'

Thus A-n points into into the Info Table (A).

With PLT however we end up with:

   | Procedure Linking Table (PLT)
.->|<stub for A>  --------.
| --                      |
|  | Info table (A)       |
|  | A: function symbol <-'
|  | function body (A)
| -- other module
|  | Info table (B)
|  | B: function symbol
|  | function body (B)
|  | <read some data at A-n (0 < n < (size of info table))>
'-----------------------'

And now A-n points into random memory, not an info table.

if the reference for A-n was instead an object and we'd reference it as such, we'd have to load the object address into place, which in turn through the Global Offset Table (GOT) would require us to write code in the reference location for A-n that has to load the address from the GOT. For functions we can use the PLT and the fact that chain calling will simply work for functions.

Example in pseudo code:

fn g {
  push args
  call f
}

is functionally identical to

fn f' { call f }
fn g {
  push args
  call f'
}

when going through the GOT however, we can't use the "just jump to f" approach.

fn g {
  f' = load addr of f
  push args
  call f'
}

would be

fn g {
  got = load addr of got
  f' = lookup addr of f in got
  push args
  call f'
}

in this construction f' would always point to the true memory location of f. Not to a jump location like in the PLT case.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information