Interesting CSE/Code duplication issue.
While spelunking in GHC generate core after turning inlining up somewhat I came across this beast:
...
VanillaReg dt_a7lB [Dmd=<S,U>]
ds1_a7lC ->
case dt_a7lB
of ds2_a7lE [Dmd=<L,A>] {
__DEFAULT ->
jump go1_X7 ys_Xc eta_X9;
3# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9);
...
6# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9)
};
FloatReg dt_Xe [Dmd=<S,U>] ->
case dt_Xe of ds1_Xf [Dmd=<L,A>] {
__DEFAULT ->
jump go1_X7 ys_Xc eta_X9;
1# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9);
...
6# ->
jump go1_X7
ys_Xc
($sgo5_s6Vp y_Xb y_Xb eta_X9)
};
The jump being duplicated is not a problem. It's simply compiled to a goto. However the call to ($sgo5_s6Vp y_Xb y_Xb eta_X9)
is duplicated a few times. It appears ~1500 times in the simplified output.
Since go1_X7
is strict in it's argument we float it out and end up with an infinity of alternatives like this:
3# ->
case $sgo5_s6Vp y_Xb y_Xb eta_X9 of
sat_sfwa [Occ=Once1]
{
__DEFAULT ->
go1_X7
ys1_sfw4
sat_sfwa;
};
This seems quite similar to #14684 however the main difference is that there legitimately are more default than non-default causes which is why I'm putting this into a separate ticket.
I'm not really sure how to get there. But what seems logical to do is generate join points for the common cases and then simply jump to them. That is do a transformation like this:
case x of
_Default -> e_default;
C1 b1 -> e_1[b1];
...
Cn bn -> e_n[bn];
===> if e_1 == e_n then transform this into
join common_alt n = e[n]
case x of
_Default -> e_default;
C1 b1 -> jump common_alt b1;
...
Cn bn -> jump common_alt bn;
In the case I noticed there are no constructor binders. But even when there are we could common up alternatives who's binders match up in their runtimerep. When common_alt has no binding arguments we would then float it out and CSE could remove duplicate definitions.
The join point would just be a single jump so this should be beneficial even for smallish expressions (depending on how common they are). There is a risk that we would end up with a (pull alts out <=> inline alts) loop though.
Maybe this could be worked into FloatOut somehow? Not sure.