-
Ömer Sinan Ağacan authored
Say we have a record like this: data Rec = Rec { f1 :: Int , f2 :: Int , f3 :: Int , f4 :: Int , f5 :: Int } Before this patch, the code generated for `f1` looked like this: f1_entry() {offset ... cJT: _sI6::P64 = R1; _sI7::P64 = P64[_sI6::P64 + 7]; _sI8::P64 = P64[_sI6::P64 + 15]; _sI9::P64 = P64[_sI6::P64 + 23]; _sIa::P64 = P64[_sI6::P64 + 31]; _sIb::P64 = P64[_sI6::P64 + 39]; R1 = _sI7::P64 & (-8); Sp = Sp + 8; call (I64[R1])(R1) args: 8, res: 0, upd: 8; } Note how all fields of the record are moved to local variables, even though they're never used. These moves make it to the final assembly: f1_info: ... _cJT: movq 7(%rbx),%rax movq 15(%rbx),%rcx movq 23(%rbx),%rcx movq 31(%rbx),%rcx movq 39(%rbx),%rbx movq %rax,%rbx andq $-8,%rbx addq $8,%rbp jmp *(%rbx) With this patch we stop generating these move instructions. Cmm becomes this: f1_entry() {offset ... cJT: _sI6::P64 = R1; _sI7::P64 = P64[_sI6::P64 + 7]; R1 = _sI7::P64 & (-8); Sp = Sp + 8; call (I64[R1])(R1) args: 8, res: 0, upd: 8; } Assembly becomes this: f1_info: ... _cJT: movq 7(%rbx),%rax movq %rax,%rbx andq $-8,%rbx addq $8,%rbp jmp *(%rbx) It turns out CmmSink already optimizes this, but it's better to generate better code in the first place. Reviewers: simonmar, simonpj, austin, bgamari Reviewed By: simonmar, simonpj Subscribers: rwbarton, thomie Differential Revision: https://phabricator.haskell.org/D2269
cd50d236