Skip to content

DmdAnal: Implement Boxity Analysis (#19871)

Sebastian Graf requested to merge wip/T19871 into master

This patch fixes some abundant reboxing of DynFlags in GHC.HsToCore.Match.Literal.warnAboutOverflowedLit (which was the topic of #19407 (closed)) by introducing a Boxity analysis to GHC, done as part of demand analysis. This allows to accurately capture ad-hoc unboxing decisions previously made in worker/wrapper in demand analysis now, where the boxity info can propagate through demand signatures.

See the new Note [Boxity analysis]. The actual fix for #19407 (closed) is described in Note [No lazy, Unboxed demand in demand signature], but Note [Finalising boxity for demand signature] is probably a better entry-point.

To support the fix for #19407 (closed), I had to change (what was) Note [Add demands for strict constructors] a bit (now Note [Unboxing evaluated arguments]). In particular, we now take care of it in finaliseBoxity (which is only called from demand analaysis) instead of wantToUnboxArg.

I also had to resurrect Note [Product demands for function body] and rename it to Note [Unboxed demand on function bodies returning small products] to avoid huge regressions in join004 and join007, thereby fixing #4267 (closed) again. See the updated Note for details.

A nice side-effect is that the worker/wrapper transformation no longer needs to look at strictness info and other bits such as InsideInlineableFun flags (needed for Note [Do not unbox class dictionaries]) at all. It simply collects boxity info from argument demands and interprets them with a severely simplified wantToUnboxArg. All the smartness is in finaliseBoxity, which could be moved to DmdAnal completely, if it wasn't for the call to dubiousDataConInstArgTys which would be awkward to export.

I spent some time figuring out the reason for why T16197 failed prior to my amendments to Note [Unboxing evaluated arguments]. After having it figured out, I minimised it a bit and added T16197b, which simply compares computed strictness signatures and thus should be far simpler to eyeball.

The 12% ghc/alloc regression in T11545 is because of the additional Boxity field in Poly and Prod that results in more allocation during lubSubDmd and plusSubDmd. I made sure in the ticky profiles that the number of calls to those functions stayed the same. We can bear such an increase here, as we recently improved it by -68% (in b760c1f7). T18698* regress slightly because there is more unboxing of dictionaries happening and that causes Lint (mostly) to allocate more.

Fixes #19871 (closed), #19407 (closed), #4267 (closed), #16859 (closed) and #18907 (closed).

Edited by Sebastian Graf

Merge request reports