(Derived) Ord instances generate terrible code
Consider these attempts at coming up with more compact compare
functions:
{-# OPTIONS_GHC -O2 -fforce-recomp #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash #-}
module Lib (foo, bar, baz) where
import GHC.Exts
import GHC.Classes
data U = U Int deriving Eq
instance Ord U where
-- so that this function isn't WW'd
compare ~(U a) ~(U b) = lazy $ compare a b
{-# NOINLINE compare #-}
data T
= T0 U
| T1 U
| T2 U
| T3 U
| T4 U
| T5 U
deriving (Eq, Ord)
foo :: T -> T -> Ordering
foo = compare @T
bar :: T -> T -> Ordering
bar !_ r | !_ <- dataToTag# r, id False = undefined
bar (T0 a1) (T0 a2) = compare a1 a2
bar (T1 a1) (T1 a2) = compare a1 a2
bar (T2 a1) (T2 a2) = compare a1 a2
bar (T3 a1) (T3 a2) = compare a1 a2
bar (T4 a1) (T4 a2) = compare a1 a2
bar (T5 a1) (T5 a2) = compare a1 a2
bar l r
| isTrue# (dataToTag# l <# dataToTag# r) = LT
| otherwise = GT
baz :: T -> T -> Ordering
baz l r = compareInt# (dataToTag# l) (dataToTag# r) <> go l r
where
go (T0 a1) (T0 a2) = compare a1 a2
go (T1 a1) (T1 a2) = compare a1 a2
go (T2 a1) (T2 a2) = compare a1 a2
go (T3 a1) (T3 a2) = compare a1 a2
go (T4 a1) (T4 a2) = compare a1 a2
go (T5 a1) (T5 a2) = compare a1 a2
bar
speculates dataToTag#
on the second argument, so that its result is shared among all case branches. baz
goes a step further: It simply compares tags and only compares fields when the tags match.
Here's the generated Core for the T3
case:
foo a b_aLh = case a of {
T3 a1_aLp ->
case dataToTag# @T b_aLh of b#_aLq { __DEFAULT ->
case <# b#_aLq 3# of {
__DEFAULT ->
case b_aLh of {
__DEFAULT -> GHC.Types.LT;
T3 b1_aLr -> Lib.foo_$ccompare a1_aLp b1_aLr
};
1# -> GHC.Types.GT
}
}
}
bar a b_aLh =
case dataToTag# @T b_aLh of ds1_dU3 { __DEFAULT ->
case a of {
T3 a1_aJb ->
case b_aLh of {
__DEFAULT ->
case <# 3# ds1_dU3 of {
__DEFAULT -> GHC.Types.GT;
1# -> GHC.Types.LT
};
T3 a2_aJc -> Lib.foo_$ccompare a1_aJb a2_aJc
};
}
}
baz l_aJj r_aJk =
case dataToTag# @T r_aJk of wild_X1 { __DEFAULT ->
case dataToTag# @T l_aJj of wild1_X2 { __DEFAULT ->
case <# wild1_X2 wild_X1 of {
1# -> GHC.Types.LT;
__DEFAULT ->
case ==# wild1_X2 wild_X1 of {
__DEFAULT -> GHC.Types.GT;
1# ->
case l_aJj of {
T3 a1_aJs ->
case r_aJk of {
__DEFAULT -> Lib.baz1; -- pattern match error thunk
T3 a2_aJt -> Lib.foo_$ccompare a1_aJs a2_aJt
};
}
}
}
That is I think about the Core I'd expect, with one exception: I'd hoped that GHC would detect that the pattern match error in baz
is impossible, which it doesn't.
Although baz
looks rather big, I expected most of it to lower to very simple instructions. Well, nm --print-size --size-sort test.o
reveals:
0000000000001180 000000000000032f T Lib_bar_info
0000000000000b00 0000000000000373 T Lib_foo_info
00000000000014c8 000000000000038f T Lib_bazz_info
So bar
is a bit smaller than foo
(the stock derived function), while baz
is larger. All of them are quite large: They range between 0x300 (768) and 0x400 (1024) bytes!
I had a cursory look at the Cmm, and I was lost almost instantly. That was due to #19444, I think.
It's surely possible to generate more compact code for derived Ord
instances. It's so common, yet so much more inefficient to do the equivalent in C!