Slow 64-bit primops on 32 bit system
GHC primops for 64-bit arithmetic are implemented as FFI calls. It leads to serious performance penalty for 32 bit code which heavily uses 64-bit arithmetics.
I found this while investigating poor performance of mwc-random on 32-bit systems. 32-bit build runs 3-4 times slower than 64-bit build on the same hardware. It's difficult to estimate how faster would run optimal implementation since it doesn't exist. But it's probably at least 2x slowdown.
Here is simple program to demonstrate issue
sqr64 :: Int32 -> Int64
sqr64 x = y * y where y = fromIntegral x
Here is optimized core
$wsqr64 :: Int# -> Int64
$wsqr64 =
\ (ww_sGO :: Int#) ->
case {__pkg_ccall ghc-prim hs_intToInt64 Int#
-> State# RealWorld -> (# State# RealWorld, Int64# #)}_aFY
ww_sGO realWorld#
of _ { (# _, ds2_aG2 #) ->
case {__pkg_ccall ghc-prim hs_timesInt64 Int64#
-> Int64# -> State# RealWorld -> (# State# RealWorld, Int64# #)}_aGc
ds2_aG2 ds2_aG2 realWorld#
of _ { (# _, ds4_aGi #) ->
I64# ds4_aGi
}
}
sqr64 :: Int32 -> Int64
sqr64 = \ (w_sGM :: Int32) ->
case w_sGM of _ { I32# ww_sGO -> $wsqr64 ww_sGO }
Trac metadata
Trac field | Value |
---|---|
Version | 7.2.1 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |