Binder-swap in OccurAnal not working properly
The mysteriously-named @aadaa-fgtaa found that case-merging wasn't working properly. See his report below.
This was originally reported in #19542, but it's an entirely different issue, and nothing to do with unsafeEqualityProof
.
Here is his report:
type Member :: [k] -> k -> Type
data Member xs x where
Z :: Member (x:xs) x
S :: !(Member xs x) -> Member (h:xs) x
My goal was to turn it into newtype around Int, for performance reasons.
In particular, I want nested pattern matching on Member (like in function test
below) to be compiled into single switch on Int.
{-# OPTIONS_GHC -O -dno-typeable-binds -fforce-recomp -dsuppress-all -ddump-simpl -ddump-stg-final -ddump-cmm #-}
{-# LANGUAGE StandaloneKindSignatures, ViewPatterns, RankNTypes, DataKinds, PolyKinds, TypeOperators, PatternSynonyms, GADTs, LambdaCase #-}
module UCOpt where
import Data.Kind
import Unsafe.Coerce
type Member :: [k] -> k -> Type
newtype Member xs x = UnsafeMember Int
data MemberView xs x where
ViewZ :: MemberView (x:xs) x
ViewS :: !(Member xs x) -> MemberView (h:xs) x
viewMember :: Member xs x -> MemberView xs x
viewMember (UnsafeMember 0) = unsafeCoerce ViewZ
viewMember (UnsafeMember n) = unsafeCoerce ViewS $ UnsafeMember $ n - 1
pattern Z :: () => forall xs . xxs ~ (x:xs) => Member xxs x
pattern Z <- (viewMember -> ViewZ)
where Z = UnsafeMember 0
pattern S :: () => forall h xs . xxs ~ (h:xs) => Member xs x -> Member xxs x
pattern S a <- (viewMember -> ViewS a)
where S (UnsafeMember n) = UnsafeMember $ n + 1
test :: Member xs x -> Int
test = \case
Z -> 1
S Z -> 2
S (S Z) -> 3
S (S (S Z)) -> 4
S (S (S (S Z))) -> 5
S (S (S (S (S Z)))) -> 6
_ -> 0
I expect optimised core of test
to look like
test = \ @k @xs @x m ->
case m `cast` Member xs x ~R# Int of I# i ->
case i of
0# -> 1
1# -> 2
2# -> 3
3# -> 4
4# -> 5
5# -> 6
_ -> 0
That is actually roughly the code ghc-8.10.4 produces when unsafeCoerce is inlined (which is of course dangerous and leads to segfaults).
Unluckily, with ghc-9.0.1 unsafeEqualityProof
calls from S
and Z
pattern synonyms prevent case statements on Member from being flattened into one, producing huge core. This leads to O(n) lookup and, even more important, makes test
(or more precisely its worker $wtest
) significantly more expensive in terms of inlining.
-- RHS size: {terms: 59, types: 1,012, coercions: 0, joins: 0/0}
$wtest
= \ @k_se9j @xs_se9k @x_se9l ww_se9p ->
case ww_se9p of ds_X2 {
__DEFAULT ->
case unsafeEqualityProof of { UnsafeRefl co_a1rC ->
case ds_X2 of lwild_se2M {
__DEFAULT ->
case unsafeEqualityProof of { UnsafeRefl co1_Xa ->
case lwild_se2M of {
__DEFAULT ->
case lwild_se2M of {
__DEFAULT ->
case lwild_se2M of {
__DEFAULT ->
case lwild_se2M of {
__DEFAULT -> 0#;
5# -> case unsafeEqualityProof of { UnsafeRefl co2_XH -> 6# }
};
4# -> case unsafeEqualityProof of { UnsafeRefl co2_Xx -> 5# }
};
3# -> case unsafeEqualityProof of { UnsafeRefl co2_Xp -> 4# }
};
2# -> case unsafeEqualityProof of { UnsafeRefl co2_Xi -> 3# }
}
};
1# -> case unsafeEqualityProof of { UnsafeRefl co1_Xa -> 2# }
}
};
0# -> case unsafeEqualityProof of { UnsafeRefl co_a1rC -> 1# }
}
-- RHS size: {terms: 13, types: 15, coercions: 4, joins: 0/0}
test
= \ @k_se9j @xs_se9k @x_se9l w_se9m ->
case w_se9m `cast` <Co:4> of { I# ww1_se9p ->
case $wtest ww1_se9p of ww2_se9t { __DEFAULT -> I# ww2_se9t }
}
(I'm also quite surprised that at least three inner cases (returning 0, 4, 5 and 6) are not flattened together. What stops ghc from doing this?)
Of course, CoreToStg drops all unsafeEqualityProof
s, but pattern matching remains nested:
$wtest =
\r [ww_sefL]
case ww_sefL of ds_sefM {
__DEFAULT ->
case ds_sefM of lwild_sefO {
__DEFAULT ->
case lwild_sefO of lwild1_sefQ {
__DEFAULT ->
case lwild1_sefQ of {
__DEFAULT ->
case lwild1_sefQ of {
__DEFAULT ->
case lwild1_sefQ of {
__DEFAULT -> 0#;
5# -> 6#;
};
4# -> 5#;
};
3# -> 4#;
};
2# -> 3#;
};
1# -> 2#;
};
0# -> 1#;
};
test =
\r [w_seg0]
case w_seg0 of {
I# ww1_seg2 ->
case $wtest ww1_seg2 of ww2_seg3 { __DEFAULT -> I# [ww2_seg3]; };
};
And in the end it compiles into linear search in cmm:
...
cegC:
if (R2 != 0) goto cegA; else goto cegB;
cegA:
if (R2 != 1) goto ceh2; else goto ceh3;
ceh2:
if (R2 != 2) goto cegZ; else goto ceh0;
cegZ:
if (R2 != 3) goto cegW; else goto cegX;
cegW:
if (R2 != 4) goto cegT; else goto cegU;
cegT:
if (R2 != 5) goto cegQ; else goto cegR;
...
I suppose that floating in unsafeEqualityProof
, as @simonpj suggested, should really help here, making cases on Member be flattened into one.
I also wonder if ghc could have special rules for computing inlining cost of expressions that use unsafeCoerce, namely that case unsafeEqualityProof of UnsafeRefl co -> expr
should be inlined exactly as expr
would, as CoreToStg would turn the former into the latter.
With ghc-9.0.1 it isn't the case, for example with those two definitions below and -funfolding-use-threshold=5
foo 1
is inlined, but fooUC 1
isn't.
fooUC :: Int -> Int
fooUC n = case unsafeEqualityProof @Int @Any of
UnsafeRefl -> case unsafeEqualityProof @String @Any of
UnsafeRefl -> case unsafeEqualityProof @Bool @Any of
UnsafeRefl -> case unsafeEqualityProof @[Int] @Any of
UnsafeRefl -> n + 1
foo :: Int -> Int
foo n = n + 1
Finally, could ghc drop unused unsafeEqualityProof in core, rather that doing that in CoreToStg? E.g. in fooUC
above none of unsafe coercions are actually used, yet they are dropped only in stg, causing "core bloat".