Musings on manual type class desugaring

I recently wrote a short post explaining why manual type class desugaring was different to actually writing a type class because of how they are optimised. http://mpickering.github.io/posts/2018-03-20-recordsvstypeclasses.html

I implement 4 different equivalent programs which are all optimised differently. I paste the whole file below as it is not very big.

Implementation 1 is in terms of a type class. Implementation 2 is in terms of explicit dictionary passing. Implementation 3 wraps a dictionary in a type class Implementation 4 wraps a dictionary in a type class with an additional dummy argument.

Naively, a user would expect all 4 implementations to be as fast as each other.

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
module Prop where

import Prelude (Bool(..), (||), (&&))

-- Implementation 1

class Prop r where
  or :: r -> r -> r
  and :: r -> r -> r
  true :: r
  false :: r

instance Prop Bool where
  or = (||)
  and = (&&)
  true = True
  false = False

-- Implementation 2

data PropDict r = PropDict {
  dor :: r -> r -> r
  , dand :: r -> r -> r
  , dtrue :: r
  , dfalse :: r
  }

boolDict = PropDict {
  dor = (||)
  , dand = (&&)
  , dtrue = True
  , dfalse = False }

-- Implementation 3

class PropProxy r where
  propDict :: PropDict r

instance PropProxy Bool where
  propDict = boolDict

-- Implementation 4

class PropProxy2 r where
  propDict2 :: PropDict r
  dummy :: ()

instance PropProxy2 Bool where
  propDict2 = boolDict
  dummy = ()


ors :: Prop r => [r] -> r
ors [] = true
ors (o:os) = o `or` ors os
{-# INLINABLE ors #-}

dors :: PropDict r -> [r] -> r
dors pd [] = dtrue pd
dors pd (o:os) = dor pd o (dors pd os)

pors :: PropProxy r => [r] -> r
pors [] = dtrue propDict
pors (o:os) = dor propDict o (pors os)
{-# INLINABLE pors #-}

porsProxy :: PropProxy2 r => [r] -> r
porsProxy [] = dtrue propDict2
porsProxy (o:os) = dor propDict2 o (porsProxy os)
{-# INLINABLE porsProxy #-}

Then using the 4 different implementations of ors in another module implementations 1 and 4 are fast whilst 2 and 3 are slow.

https://github.com/mpickering/rtcwrao-benchmarks/blob/master/Prop2.hs

benchmarking tc/Implementation 1
time                 3.510 ms   (3.509 ms .. 3.512 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.976 ms   (2.886 ms .. 3.060 ms)
std dev              241.1 μs   (195.4 μs .. 293.1 μs)
variance introduced by outliers: 51% (severely inflated)

benchmarking tc/Implementation 2
time                 25.05 ms   (21.16 ms .. 30.43 ms)
                     0.912 R²   (0.849 R² .. 0.984 R²)
mean                 19.18 ms   (16.20 ms .. 21.45 ms)
std dev              5.627 ms   (4.710 ms .. 6.618 ms)
variance introduced by outliers: 89% (severely inflated)

benchmarking tc/Implementation 3
time                 20.06 ms   (15.33 ms .. 23.57 ms)
                     0.856 R²   (0.755 R² .. 0.934 R²)
mean                 18.43 ms   (16.92 ms .. 19.85 ms)
std dev              3.490 ms   (3.003 ms .. 4.076 ms)
variance introduced by outliers: 74% (severely inflated)

benchmarking tc/Implementation 4
time                 3.498 ms   (3.484 ms .. 3.513 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.016 ms   (2.935 ms .. 3.083 ms)
std dev              205.7 μs   (162.6 μs .. 261.8 μs)
variance introduced by outliers: 42% (moderately inflated)

I compiled the module with -O2. If I turn off -fno-worker-wrapper and -fno-spec-constr then implementation 3 is also fast. Implementation 2 is always slow.

This ticket is querying what could be done to improve the robustness of these different refactorings.

Trac metadata

Trac field	Value
Version	8.2.2
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information