LLVM backend generates slow code in some non-strict scenarios
Summary
This issue was originally reported for the vector
package at https://github.com/haskell/vector/issues/409. The people there encouraged me to open a GHC issue for it.
When using the vector
package with GHC's LLVM backend, the function freeze
(O(n), copies array) is sometimes faster than unsafeFreeze
(O(1), no copying). This seems strange as the latter does strictly less work.
Some people responding to the issue at the vector
package proposed that this could be a strange interaction between GHC's strictness analysis and the LLVM backend (although it also manifests to some extent with NCG).
Steps to reproduce
{-# LANGUAGE BangPatterns #-}
import Control.Monad.ST
import Criterion.Main
import Data.Vector.Unboxed
import Data.Vector.Unboxed.Mutable
freezeLazy :: Int -> Vector Double
freezeLazy n = runST $ do
pv <- unsafeNew n
let w 0 i = return ()
w n i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
freeze pv
unsafeFreezeLazy :: Int -> Vector Double
unsafeFreezeLazy n = runST $ do
pv <- unsafeNew n
let w 0 i = return ()
w n i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
unsafeFreeze pv
freezeStrict :: Int -> Vector Double
freezeStrict n = runST $ do
pv <- unsafeNew n
let w 0 !i = return ()
w n !i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
freeze pv
unsafeFreezeStrict :: Int -> Vector Double
unsafeFreezeStrict n = runST $ do
pv <- unsafeNew n
let w 0 !i = return ()
w n !i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
unsafeFreeze pv
main :: IO ()
main = defaultMain [go "freezeLazy" freezeLazy, go "unsafeFreezeLazy" unsafeFreezeLazy, go "freezeStrict" freezeStrict, go "unsafeFreezeStrict" unsafeFreezeStrict] where
go n f = bench n $ whnf f 1000000
Compiling with -O2
and -O2 -fllvm
, respectively, I get the following results:
NCG LLVM
freezeLazy 677.5 μs 668.7 μs
unsafeFreezeLazy 708.0 μs 987.4 μs
freezeStrict 682.5 μs 654.7 μs
unsafeFreezeStrict 312.5 μs 300.1 μs
Here, LLVM (and to some extent also NCG) generates slower code for unsafeFreezeLazy
than for freezeLazy
. However, with a strictness annotation, unsafeFreeze
is always faster.
Someone responding to the issue at the vector
package was able to reproduce the same behavior with PrimArray
from the primitive
package.
Expected behavior
I would have expected unsafeFreeze
to always be faster than freeze
.
Environment
- GHC version used: 8.10.5
- LLVM version used: 12.0.1
Optional:
- Operating System: Arch Linux
- System Architecture: x64