DmdAnal: Implement Boxity Analysis (#19871)
This patch fixes some abundant reboxing of DynFlags
in
GHC.HsToCore.Match.Literal.warnAboutOverflowedLit
(which was the topic
of #19407 (closed)) by introducing a Boxity analysis to GHC, done as part of demand
analysis. This allows to accurately capture ad-hoc unboxing decisions previously
made in worker/wrapper in demand analysis now, where the boxity info can
propagate through demand signatures.
See the new Note [Boxity analysis]
. The actual fix for #19407 (closed) is described in
Note [No lazy, Unboxed demand in demand signature]
, but
Note [Finalising boxity for demand signature]
is probably a better entry-point.
To support the fix for #19407 (closed), I had to change (what was)
Note [Add demands for strict constructors]
a bit
(now Note [Unboxing evaluated arguments]
). In particular, we now take care of
it in finaliseBoxity
(which is only called from demand analaysis) instead of
wantToUnboxArg
.
I also had to resurrect Note [Product demands for function body]
and rename
it to Note [Unboxed demand on function bodies returning small products]
to
avoid huge regressions in join004
and join007
, thereby fixing #4267 (closed) again.
See the updated Note for details.
A nice side-effect is that the worker/wrapper transformation no longer needs to
look at strictness info and other bits such as InsideInlineableFun
flags
(needed for Note [Do not unbox class dictionaries]
) at all. It simply collects
boxity info from argument demands and interprets them with a severely simplified
wantToUnboxArg
. All the smartness is in finaliseBoxity
, which could be moved
to DmdAnal completely, if it wasn't for the call to dubiousDataConInstArgTys
which would be awkward to export.
I spent some time figuring out the reason for why T16197
failed prior to my
amendments to Note [Unboxing evaluated arguments]
. After having it figured
out, I minimised it a bit and added T16197b
, which simply compares computed
strictness signatures and thus should be far simpler to eyeball.
The 12% ghc/alloc regression in T11545 is because of the additional Boxity
field in Poly
and Prod
that results in more allocation during lubSubDmd
and plusSubDmd
. I made sure in the ticky profiles that the number of calls
to those functions stayed the same. We can bear such an increase here, as we
recently improved it by -68% (in b760c1f7).
T18698* regress slightly because there is more unboxing of dictionaries
happening and that causes Lint (mostly) to allocate more.
Fixes #19871 (closed), #19407 (closed), #4267 (closed), #16859 (closed) and #18907 (closed).