Skip to content

Optimise ELF Linker (#23464)

aadaa-fgtaa requested to merge aadaa-fgtaa/ghc:optimize_elf_linker into master

This is an attempt at optimizing runtime linker to speed up the reproducer of #23464 (closed). According to perf record, about a half of its runtime is spent in checkProddableBlock, around 25% in ocResolve_ELF and ~10% in ocInit_ELF. I tried to address their inefficiencies, but due to lack of detailed knowledge about linker I'm not sure whether these changes are correct.

  • ocInit_ELF's bottleneck is repeated O(n) append to single linked lists relTable, relaTable and symbolTables. This MR caches their last nodes in ObjectCodeFormatInfo to allow O(1) append.
  • ocResolve_ELF spends most of its time running do_Elf_Rela_relocations, which in turns mostly computes get_shndx_table(ehdrC). This MR hoists this call out of loop by passing its result to doElf_Rela_relocations as an argument.
  • checkProddableBlock, which is accountable for half of reproducer's runtime, is traversing a large linked list ensuring that given block is within some block in the list, and panics otherwise. From the comments in rts/Linker.c it appears that this is just a sanity check, yet it is currently enabled in non-debug RTS. This MR disables the checks when using non-debug RTS.

On my machine, these changes speed up the first run of the reproducer from #23464 (closed) by more than 10 times, from ~9 to ~0.7 seconds.

Questions:

  • Is checkProddableBlock checking only ghc's sanity or the correctness of the object file? Can this check be avoided in non-debug RTS? If that's not the case I can try to turn proddables into a more suitable data structure.
  • Is it safe to assume that shndx_table always returns the same value for a given Elf_Shdr, or modifications to Elf_Shdr can change its result?

cc @bgamari @angerman @AndreasK

Edited by aadaa-fgtaa

Merge request reports