Investigate the performance impact of code alignment

Maybe ghc's performance also varies due to reasons like that:

https://dendibakh.github.io/blog/2018/01/18/Code_alignment_issues

The gist of the article is that tight loops can have significantly different performance depending on whether the location of the assembly instructions themselves cross a cache line.

I would not have expected this to make double-digit percentage differences.

From #ghc:

bgamari:       nh2[m], there are nofib tests where this is very likely the cause of a good portion of the variant
angerman:      nh2[m]: that linked LLVM talk from 2016 makes me not want to have to deal with that...
AndreasK:      nh2[m]: It's a real issue. But atm I think ghc at least in the native codegen make no real attempt to optimize for this
thoughtpolice: GHC does not carry knowledge of alignment or anything, no. I’m not sure how difficult this is to suss out, but at least making sure every branch target does not cross a cache line is probably a good start
thoughtpolice: Well, far jump, e.g. a call to a function. not sure how TNTC fits into this story, tbqh

Trac metadata

Trac field	Value
Version	8.2.2
Type	Task
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler (CodeGen)
Test case
Differential revisions
BlockedBy
Related
Blocking
CC	nh2
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information