NCG: Only adjust al before foreign calls if required.
This is a typical sequence for a foreign call (although newCAF is a RTS function).
We zero rax to indicate the number of vector registers used. However this is here (and usually) redundant.
subq $8,%rsp movq %r13,%rax movq %rbx,%rsi movq %rax,%rdi xorl %eax,%eax call newCAF
The xor eax, eax here is redundant
The System V spec says:
For calls that may call functions that use varargs or stdargs (prototype-less calls or calls to functions containing ellipsis (...) in the declaration) %al is used as hidden argument to specify the number of vector registers used. The contents of %al do not need to match exactly the number of registers, but must be an upper bound on the number of vector registers used and is in the range 0–8 inclusive.
This means we can omit the zeroing when we call things like RTS functions or memmov which take a fixed number of arguments at least.
On windows this isn't part of the ABI at all so there we can just omit it completely.
It's not a huge deal. For nofib/fannkuch-redux these xors are ~0,3% of the size of the .text segment. But it might help with cache in some edge cases.