Skip to content

(Derived) Ord instances generate terrible code

Consider these attempts at coming up with more compact compare functions:

{-# OPTIONS_GHC -O2 -fforce-recomp #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MagicHash #-}

module Lib (foo, bar, baz) where

import GHC.Exts
import GHC.Classes

data U = U Int deriving Eq

instance Ord U where
  -- so that this function isn't WW'd
  compare ~(U a) ~(U b) = lazy $ compare a b
  {-# NOINLINE compare #-}

data T
  = T0 U
  | T1 U
  | T2 U
  | T3 U
  | T4 U
  | T5 U
  deriving (Eq, Ord)

foo :: T -> T -> Ordering
foo = compare @T

bar :: T -> T -> Ordering
bar !_ r | !_ <- dataToTag# r, id False = undefined
bar (T0 a1) (T0 a2) = compare a1 a2
bar (T1 a1) (T1 a2) = compare a1 a2
bar (T2 a1) (T2 a2) = compare a1 a2
bar (T3 a1) (T3 a2) = compare a1 a2
bar (T4 a1) (T4 a2) = compare a1 a2
bar (T5 a1) (T5 a2) = compare a1 a2
bar l r
  | isTrue# (dataToTag# l <# dataToTag# r) = LT
  | otherwise                              = GT

baz :: T -> T -> Ordering
baz l r = compareInt# (dataToTag# l) (dataToTag# r) <> go l r
  where
    go (T0 a1) (T0 a2) = compare a1 a2
    go (T1 a1) (T1 a2) = compare a1 a2
    go (T2 a1) (T2 a2) = compare a1 a2
    go (T3 a1) (T3 a2) = compare a1 a2
    go (T4 a1) (T4 a2) = compare a1 a2
    go (T5 a1) (T5 a2) = compare a1 a2

bar speculates dataToTag# on the second argument, so that its result is shared among all case branches. baz goes a step further: It simply compares tags and only compares fields when the tags match.

Here's the generated Core for the T3 case:

foo a b_aLh = case a of {
  T3 a1_aLp ->
    case dataToTag# @T b_aLh of b#_aLq { __DEFAULT ->
    case <# b#_aLq 3# of {
      __DEFAULT ->
        case b_aLh of {
          __DEFAULT -> GHC.Types.LT;
          T3 b1_aLr -> Lib.foo_$ccompare a1_aLp b1_aLr
        };
      1# -> GHC.Types.GT
    }
  }
}

bar a b_aLh = 
  case dataToTag# @T b_aLh of ds1_dU3 { __DEFAULT ->
  case a of {
    T3 a1_aJb ->
      case b_aLh of {
        __DEFAULT ->
          case <# 3# ds1_dU3 of {
            __DEFAULT -> GHC.Types.GT;
            1# -> GHC.Types.LT
          };
        T3 a2_aJc -> Lib.foo_$ccompare a1_aJb a2_aJc
      };
    }
  }

baz l_aJj r_aJk =
      case dataToTag# @T r_aJk of wild_X1 { __DEFAULT ->
      case dataToTag# @T l_aJj of wild1_X2 { __DEFAULT ->
      case <# wild1_X2 wild_X1 of {
        1# -> GHC.Types.LT;
        __DEFAULT ->
          case ==# wild1_X2 wild_X1 of {
            __DEFAULT -> GHC.Types.GT;
            1# ->
              case l_aJj of {
                T3 a1_aJs ->
                  case r_aJk of {
                    __DEFAULT -> Lib.baz1; -- pattern match error thunk
                    T3 a2_aJt -> Lib.foo_$ccompare a1_aJs a2_aJt
                  };
              }
        }
      }

That is I think about the Core I'd expect, with one exception: I'd hoped that GHC would detect that the pattern match error in baz is impossible, which it doesn't.

Although baz looks rather big, I expected most of it to lower to very simple instructions. Well, nm --print-size --size-sort test.o reveals:

0000000000001180 000000000000032f T Lib_bar_info
0000000000000b00 0000000000000373 T Lib_foo_info
00000000000014c8 000000000000038f T Lib_bazz_info

So bar is a bit smaller than foo (the stock derived function), while baz is larger. All of them are quite large: They range between 0x300 (768) and 0x400 (1024) bytes!

I had a cursory look at the Cmm, and I was lost almost instantly. That was due to #19444, I think. It's surely possible to generate more compact code for derived Ord instances. It's so common, yet so much more inefficient to do the equivalent in C!

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information