Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,273
    • Issues 4,273
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 413
    • Merge Requests 413
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #14955

Closed
Open
Opened Mar 21, 2018 by Matthew Pickering@mpickeringDeveloper

Musings on manual type class desugaring

I recently wrote a short post explaining why manual type class desugaring was different to actually writing a type class because of how they are optimised. http://mpickering.github.io/posts/2018-03-20-recordsvstypeclasses.html

I implement 4 different equivalent programs which are all optimised differently. I paste the whole file below as it is not very big.

Implementation 1 is in terms of a type class. Implementation 2 is in terms of explicit dictionary passing. Implementation 3 wraps a dictionary in a type class Implementation 4 wraps a dictionary in a type class with an additional dummy argument.

Naively, a user would expect all 4 implementations to be as fast as each other.

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
module Prop where

import Prelude (Bool(..), (||), (&&))

-- Implementation 1

class Prop r where
  or :: r -> r -> r
  and :: r -> r -> r
  true :: r
  false :: r

instance Prop Bool where
  or = (||)
  and = (&&)
  true = True
  false = False

-- Implementation 2

data PropDict r = PropDict {
  dor :: r -> r -> r
  , dand :: r -> r -> r
  , dtrue :: r
  , dfalse :: r
  }

boolDict = PropDict {
  dor = (||)
  , dand = (&&)
  , dtrue = True
  , dfalse = False }

-- Implementation 3

class PropProxy r where
  propDict :: PropDict r

instance PropProxy Bool where
  propDict = boolDict

-- Implementation 4

class PropProxy2 r where
  propDict2 :: PropDict r
  dummy :: ()

instance PropProxy2 Bool where
  propDict2 = boolDict
  dummy = ()


ors :: Prop r => [r] -> r
ors [] = true
ors (o:os) = o `or` ors os
{-# INLINABLE ors #-}

dors :: PropDict r -> [r] -> r
dors pd [] = dtrue pd
dors pd (o:os) = dor pd o (dors pd os)

pors :: PropProxy r => [r] -> r
pors [] = dtrue propDict
pors (o:os) = dor propDict o (pors os)
{-# INLINABLE pors #-}

porsProxy :: PropProxy2 r => [r] -> r
porsProxy [] = dtrue propDict2
porsProxy (o:os) = dor propDict2 o (porsProxy os)
{-# INLINABLE porsProxy #-}

Then using the 4 different implementations of ors in another module implementations 1 and 4 are fast whilst 2 and 3 are slow.

https://github.com/mpickering/rtcwrao-benchmarks/blob/master/Prop2.hs

benchmarking tc/Implementation 1
time                 3.510 ms   (3.509 ms .. 3.512 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.976 ms   (2.886 ms .. 3.060 ms)
std dev              241.1 μs   (195.4 μs .. 293.1 μs)
variance introduced by outliers: 51% (severely inflated)

benchmarking tc/Implementation 2
time                 25.05 ms   (21.16 ms .. 30.43 ms)
                     0.912 R²   (0.849 R² .. 0.984 R²)
mean                 19.18 ms   (16.20 ms .. 21.45 ms)
std dev              5.627 ms   (4.710 ms .. 6.618 ms)
variance introduced by outliers: 89% (severely inflated)

benchmarking tc/Implementation 3
time                 20.06 ms   (15.33 ms .. 23.57 ms)
                     0.856 R²   (0.755 R² .. 0.934 R²)
mean                 18.43 ms   (16.92 ms .. 19.85 ms)
std dev              3.490 ms   (3.003 ms .. 4.076 ms)
variance introduced by outliers: 74% (severely inflated)

benchmarking tc/Implementation 4
time                 3.498 ms   (3.484 ms .. 3.513 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.016 ms   (2.935 ms .. 3.083 ms)
std dev              205.7 μs   (162.6 μs .. 261.8 μs)
variance introduced by outliers: 42% (moderately inflated)

I compiled the module with -O2. If I turn off -fno-worker-wrapper and -fno-spec-constr then implementation 3 is also fast. Implementation 2 is always slow.

This ticket is querying what could be done to improve the robustness of these different refactorings.

Trac metadata
Trac field Value
Version 8.2.2
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#14955