Skip to content

`round :: Double -> Int64` much slower than `fromIntegral @Int @Int64 . round`

Consider this simple code:

{-# LANGUAGE TypeApplications #-}
{-# OPTIONS_GHC -ddump-simpl -ddump-to-file -ddump-rule-firings #-}
module C (f, g) where

import Data.Int (Int64)

f :: Double -> Int64
f = round

g :: Double -> Int64
g = fromIntegral @Int @Int64 . round

There is a rule in GHC.Int that should translate f into g

"round/Double->Int64"
    round    = (fromIntegral :: Int -> Int64) . (round  :: Double -> Int)

If I compile the above module, I see

Rule fired: round/Double->Int64 (GHC.Int)
Rule fired: fromIntegral/a->Int64 (GHC.Int)
Rule fired: fromIntegral/Int->Int (GHC.Real)
Rule fired: fromIntegral/a->Int64 (GHC.Int)
Rule fired: fromIntegral/Int->Int (GHC.Real)
Rule fired: round/Double->Int (GHC.Float)

however in Core the functions end up different: f has an extra dictionary passed around:

-- RHS size: {terms: 11, types: 6, coercions: 0, joins: 0/0}
f :: Double -> Int64
[GblId,
 Arity=1,
 Str=<S(S),1*U(U)>m,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
         Tmpl= \ (x_a1sG [Occ=Once!] :: Double) ->
                 case x_a1sG of { GHC.Types.D# ww1_a28u [Occ=Once] ->
                 case GHC.Float.$w$cround @ Int GHC.Real.$fIntegralInt ww1_a28u of
                 { GHC.Types.I# x#_a1Rr [Occ=Once] ->
                 GHC.Int.I64# x#_a1Rr
                 }
                 }}]
f = \ (x_a1sG :: Double) ->
      case x_a1sG of { GHC.Types.D# ww1_a28u ->
      case GHC.Float.$w$cround @ Int GHC.Real.$fIntegralInt ww1_a28u of
      { GHC.Types.I# x#_a1Rr ->
      GHC.Int.I64# x#_a1Rr
      }
      }

-- RHS size: {terms: 12, types: 14, coercions: 0, joins: 0/0}
g :: Double -> Int64
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=<S(S),1*U(U)>m,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
         Tmpl= \ (x_X1sX [Occ=Once!] :: Double) ->
                 case x_X1sX of { GHC.Types.D# ds1_a28C [Occ=Once] ->
                 case {__pkg_ccall base-4.12.0.0 Double#
                           -> State# RealWorld -> (# State# RealWorld, Double# #)}_a28B
                        ds1_a28C GHC.Prim.realWorld#
                 of
                 { (# _ [Occ=Dead], ds3_a28H [Occ=Once] #) ->
                 GHC.Int.I64# (GHC.Prim.double2Int# ds3_a28H)
                 }
                 }}]
g = \ (x_X1sX :: Double) ->
      case x_X1sX of { GHC.Types.D# ds1_a28C ->
      case {__pkg_ccall base-4.12.0.0 Double#
                           -> State# RealWorld -> (# State# RealWorld, Double# #)}_a28B
             ds1_a28C GHC.Prim.realWorld#
      of
      { (# ds2_a28G, ds3_a28H #) ->
      GHC.Int.I64# (GHC.Prim.double2Int# ds3_a28H)
      }
      }

This makes it over an order of magnitude slower than what it should be!

benchmarked f
time                 111.1 ns   (109.1 ns .. 113.9 ns)
                     0.997 R²   (0.996 R² .. 0.998 R²)
mean                 111.1 ns   (110.4 ns .. 112.0 ns)
std dev              2.724 ns   (2.195 ns .. 3.271 ns)

benchmarked g
time                 7.160 ns   (7.104 ns .. 7.224 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 7.196 ns   (7.171 ns .. 7.231 ns)
std dev              100.5 ps   (71.97 ps .. 153.8 ps)

I don't know why the dictionary is present in f but not in g.

Edited by Mateusz Kowalczyk
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information