Draft: Experiment with zapping in the rewriter without directed coercions
In this MR we start off rewriting a zapped reduction and then propagate the zapping using the coercion combinators.
Here are the performance numbers:
test (ghc/alloc) | HEAD | dcoercion+zap | zap |
---|---|---|---|
CoOpt_Singletons | 984,469,440 | 1,992,914,088 | 1,843,420,584 |
LargeRecord | 6,140,072,220 | 3,327,980,336 | 2,191,354,760 |
T12227 | 479,779,040 | 293,555,520 | 267,812,024 |
T13386 | 887,913,392 | 40,795,392 | 46,132,320 |
T15703 | 532,192,308 | 390,545,584 | 1,148,686,400 |
T16577 | 7,588,794,744 | 7,961,537,664 | 7,841,667,048 |
T18223 | 1,119,964,672 | 1,142,598,536 | 1,132,690,904 |
T5030 | 356,224,592 | 103,302,512 | 105,683,712 |
T5642 | 471,947,420 | 493,938,064 | 490,058,416 |
T8095 | 3,257,820,628 | 186,612,624 | 526,866,304 |
T9630 | 1,551,652,176 | 1,585,077,800 | 1,610,447,440 |
T9872a | 1,787,159,152 | 1,928,660,896 | 578,369,176 |
T9872b | 2,082,232,304 | 2,214,371,392 | 761,973,416 |
T9872b_defer | 3,153,955,252 | 2,282,802,240 | 4,174,971,640 |
T9872c | 1,728,876,064 | 1,877,569,608 | 560,414,352 |
T9872d | 449,285,296 | 425,046,152 | 834,352,192 |
I think the only positive result here is that this patch does significantly better than !7787 on the LargeRecord
test.
The results on T9872{a,b,c}
look good, but as T9872b_defer
and T9872d
show, when we actually make use of the coercions, we end up much worse off.
One problem with this approach, where we change the combinators such as mkTransCo
to zap coercions, is that it forces our hand to aggressively zap everything. For example, we don't use TyConAppCo r tc [Refl arg1, Refl arg2, Zapped small_lhs small_rhs]
instead of Zapped big_lhs big_rhs
, because if we did do this, then in a composition of several such coercions we would accumulating more and more types and coercions, instead of zapping the whole thing.