Improve block layout for the NCG
The current codegen tries to place two blocks A and B next to each other if they are "close".
Drawbacks of the current algorithm
Blocks are only considered close if one jumps to the other via a unconditional jump instructions. This is a fast and reasonable approximation. But we might be able to do better.
Calls are not considered even if they always return to the same block.
For example it's clear that if foo == bar we jump to C here:
A: movq $block_C_info,-16(%rbp) cmp foo,bar jne D B: jmp *(%rbx) #Returns to C ... .align 8 .infotable block_C_info: C: doThings
Conditional jumps are never considered.
The following block either jumps directly to c3mQ or returns there from a call. Yet that is completely ignored by the current algorithm.
_c3mG: movq $block_c3mQ_info,-8(%rbp) ... jne _c3mQ jmp *(%rbx) #returns to c3mQ
Likelyhood of branches is not considered.
We track information about branches which are rarely taken at the cmm level. However this information is currently lost during the Cmm -> Asm transformation which happens before we do code layout.
This means for Cmm code where one branch is never executed the branch that's never executed might be the only one considered relevant.
This happens for example for this stack check:
if ((Sp + -40) >= SpLim) (likely: True) goto c5PS; else goto c5Q3;
If nothing else this would allow some Cmm optimizations which are made useless by the current code layout algorithm. I have hit cases where the block layout algorithm turned optimizations into pessimizations of 2-5% by pulling obvious loops apart.
Given that the scenario that leads to loops being pulled apart doesn't seem that unlikely I assume some loops are also pulled apart like this in HEAD.
Cleaning this up ideally would also make it possible to move blocks which are not part of loops out of them.