Skip to content

Draft: Experiment: directed coercions + zapping

sheaf requested to merge wip/zap-dcoercions into master

I'm putting up this MR to get help investigating the impact of zapping directed coercions in the coercion optimiser.

The flags -fkeep-dcoercions and -fzap-dcoercions can be used to choose how to handle directed coercions in the coercion optimiser. (With this patch, by default, we zap directed coercions, unless -dcore-lint is enabled.)

Some allocation test results:

           CoOpt_Singletons(normal) ghc/alloc    984,469,440  1,992,914,088 +102.4%  BAD 
                LargeRecord(normal) ghc/alloc  6,140,072,220  3,327,980,336  -45.8% GOOD
                     T12227(normal) ghc/alloc    479,779,040    293,555,520  -38.8% GOOD  
                     T13386(normal) ghc/alloc    887,913,392     40,795,392  -95.4% GOOD 
                     T15703(normal) ghc/alloc    532,192,308    390,545,584  -26.6% GOOD
                     T16577(normal) ghc/alloc  7,588,794,744  7,961,537,664   +4.9%  BAD 
                     T18223(normal) ghc/alloc  1,119,964,672  1,142,598,536   +2.0%  BAD  
                      T5030(normal) ghc/alloc    356,224,592    103,302,512  -71.0% GOOD
                      T5642(normal) ghc/alloc    471,947,420    493,938,064   +4.7%  BAD
                      T8095(normal) ghc/alloc  3,257,820,628    186,612,624  -94.3% GOOD
                      T9630(normal) ghc/alloc  1,551,652,176  1,585,077,800   +2.2%  BAD  
                     T9872a(normal) ghc/alloc  1,787,159,152  1,928,660,896   +7.9%  BAD
                     T9872b(normal) ghc/alloc  2,082,232,304  2,214,371,392   +6.3%  BAD
               T9872b_defer(normal) ghc/alloc  3,153,955,252  2,282,802,240  -27.6% GOOD
                     T9872c(normal) ghc/alloc  1,728,876,064  1,877,569,608   +8.6%  BAD
                     T9872d(normal) ghc/alloc    449,285,296    425,046,152   -5.4% GOOD  
       TcPlugin_RewritePerf(normal) ghc/alloc  2,273,422,028  2,415,191,296   +6.2%  BAD 
                                                                                        
                          geo. mean                                           -7.7%     

This is with zapping of directed coercions in the coercion optimiser.

Some test cases are still better off with other methods. For example:

  • CoOpt_Singletons is better off if we hydrate directed coercions. We get back to the baseline in allocations. Note that zapping is also quite bad for compile time: it takes about 4.5x longer to compile than if we hydrate.
  • LargeRecord is better off if we keep directed coercions (neither hydrate nor zap). We get down to around 2,400,000,000 (so -60% instead of -45%).
Edited by sheaf

Merge request reports