Optimise ELF Linker (#23464)
This is an attempt at optimizing runtime linker to speed up the reproducer of #23464 (closed). According to perf record
, about a half of its runtime is spent in checkProddableBlock
, around 25% in ocResolve_ELF
and ~10% in ocInit_ELF
. I tried to address their inefficiencies, but due to lack of detailed knowledge about linker I'm not sure whether these changes are correct.
-
ocInit_ELF
's bottleneck is repeated O(n) append to single linked listsrelTable
,relaTable
andsymbolTables
. This MR caches their last nodes inObjectCodeFormatInfo
to allow O(1) append. -
ocResolve_ELF
spends most of its time runningdo_Elf_Rela_relocations
, which in turns mostly computesget_shndx_table(ehdrC)
. This MR hoists this call out of loop by passing its result todoElf_Rela_relocations
as an argument. -
checkProddableBlock
, which is accountable for half of reproducer's runtime, is traversing a large linked list ensuring that given block is within some block in the list, and panics otherwise. From the comments inrts/Linker.c
it appears that this is just a sanity check, yet it is currently enabled in non-debug RTS. This MR disables the checks when using non-debug RTS.
On my machine, these changes speed up the first run of the reproducer from #23464 (closed) by more than 10 times, from ~9 to ~0.7 seconds.
Questions:
- Is
checkProddableBlock
checking only ghc's sanity or the correctness of the object file? Can this check be avoided in non-debug RTS? If that's not the case I can try to turnproddables
into a more suitable data structure. - Is it safe to assume that
shndx_table
always returns the same value for a givenElf_Shdr
, or modifications toElf_Shdr
can change its result?