Skip to content

Draft: Experiment with zapping in the rewriter without directed coercions

sheaf requested to merge sheaf/ghc:zap-coercions into master

In this MR we start off rewriting a zapped reduction and then propagate the zapping using the coercion combinators.

Here are the performance numbers:

test (ghc/alloc) HEAD dcoercion+zap zap
CoOpt_Singletons 984,469,440 1,992,914,088 1,843,420,584
LargeRecord 6,140,072,220 3,327,980,336 2,191,354,760
T12227 479,779,040 293,555,520 267,812,024
T13386 887,913,392 40,795,392 46,132,320
T15703 532,192,308 390,545,584 1,148,686,400
T16577 7,588,794,744 7,961,537,664 7,841,667,048
T18223 1,119,964,672 1,142,598,536 1,132,690,904
T5030 356,224,592 103,302,512 105,683,712
T5642 471,947,420 493,938,064 490,058,416
T8095 3,257,820,628 186,612,624 526,866,304
T9630 1,551,652,176 1,585,077,800 1,610,447,440
T9872a 1,787,159,152 1,928,660,896 578,369,176
T9872b 2,082,232,304 2,214,371,392 761,973,416
T9872b_defer 3,153,955,252 2,282,802,240 4,174,971,640
T9872c 1,728,876,064 1,877,569,608 560,414,352
T9872d 449,285,296 425,046,152 834,352,192

I think the only positive result here is that this patch does significantly better than !7787 on the LargeRecord test. The results on T9872{a,b,c} look good, but as T9872b_defer and T9872d show, when we actually make use of the coercions, we end up much worse off.

One problem with this approach, where we change the combinators such as mkTransCo to zap coercions, is that it forces our hand to aggressively zap everything. For example, we don't use TyConAppCo r tc [Refl arg1, Refl arg2, Zapped small_lhs small_rhs] instead of Zapped big_lhs big_rhs, because if we did do this, then in a composition of several such coercions we would accumulating more and more types and coercions, instead of zapping the whole thing.

Edited by sheaf

Merge request reports