Consider past assignments in the linear register allocator.

This ticket is primarily for references in the future.

It's implemented in !3038 (closed).

This was prompted by #17823

The main issue is that the linear register allocation assigned registers without considering assignment history.

We can do a lot better when we try to keep assigning variables to registers they have been assigned to in the past.

In practice it turned out however that only considering the first assignment for each variable was "good enough", while doing anything more sophisticated resulted in a measureable compile time increase with marginal performance improvements at best.

Below is described the approach implemented in !3038 (closed)

When assigning registers we now first try registers we assigned to in the past, instead of picking the "first" one.

This is in extremely helpful when dealing with loops for which variables are dead in parts of the loop.

This is important for patterns like this:

    foo = arg1
loop:
    use(foo)
    ...
    foo = getVal()
    goto loop;

There we:

assign foo to the register of arg1.
use foo, it's dead after this use as it's overwritten after.
do other things.
look for a register to put foo in.

If we pick an arbitrary one it might differ from the register the start of the loop expect's foo to be in. To fix this we simply look for past register assignments for the given variable. If we find one and the register is free we use that register.

This reduces the need for fixup blocks which match the register assignment between blocks. In the example above between the end and the head of the loop.

The patch also moves branch weight estimation ahead of register allocation and adds a flag to control it (cmm-static-pred).

It means the linear allocator is more likely to assign the hotter code paths first.
If it assign these first we are:
- Less likely to spill on the hot path.
- Less likely to introduce fixup blocks on the hot path.

These two measure combined are surprisingly effective. Based on nofib we get in the mean:

-0.9% instructions executed
-0.1% reads/writes
-0.2% code size.
-0.1% compiler allocations.
-0.9% compile time.
-0.8% runtime.

Most of the benefits are simply a result of removing redundant moves and spills.

Reduced compiler allocations likely are the result of less code being generated. (The added lookup is mostly non-allocating).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information