LLVM backend generates slow code in some non-strict scenarios
Summary
This issue was originally reported for the vector package at https://github.com/haskell/vector/issues/409. The people there encouraged me to open a GHC issue for it.
When using the vector package with GHC's LLVM backend, the function freeze (O(n), copies array) is sometimes faster than unsafeFreeze (O(1), no copying). This seems strange as the latter does strictly less work.
Some people responding to the issue at the vector package proposed that this could be a strange interaction between GHC's strictness analysis and the LLVM backend (although it also manifests to some extent with NCG).
Steps to reproduce
{-# LANGUAGE BangPatterns #-}
import Control.Monad.ST
import Criterion.Main
import Data.Vector.Unboxed
import Data.Vector.Unboxed.Mutable
freezeLazy :: Int -> Vector Double
freezeLazy n = runST $ do
pv <- unsafeNew n
let w 0 i = return ()
w n i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
freeze pv
unsafeFreezeLazy :: Int -> Vector Double
unsafeFreezeLazy n = runST $ do
pv <- unsafeNew n
let w 0 i = return ()
w n i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
unsafeFreeze pv
freezeStrict :: Int -> Vector Double
freezeStrict n = runST $ do
pv <- unsafeNew n
let w 0 !i = return ()
w n !i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
freeze pv
unsafeFreezeStrict :: Int -> Vector Double
unsafeFreezeStrict n = runST $ do
pv <- unsafeNew n
let w 0 !i = return ()
w n !i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
w n 0
unsafeFreeze pv
main :: IO ()
main = defaultMain [go "freezeLazy" freezeLazy, go "unsafeFreezeLazy" unsafeFreezeLazy, go "freezeStrict" freezeStrict, go "unsafeFreezeStrict" unsafeFreezeStrict] where
go n f = bench n $ whnf f 1000000
Compiling with -O2 and -O2 -fllvm, respectively, I get the following results:
NCG LLVM
freezeLazy 677.5 μs 668.7 μs
unsafeFreezeLazy 708.0 μs 987.4 μs
freezeStrict 682.5 μs 654.7 μs
unsafeFreezeStrict 312.5 μs 300.1 μs
Here, LLVM (and to some extent also NCG) generates slower code for unsafeFreezeLazy than for freezeLazy. However, with a strictness annotation, unsafeFreeze is always faster.
Someone responding to the issue at the vector package was able to reproduce the same behavior with PrimArray from the primitive package.
Expected behavior
I would have expected unsafeFreeze to always be faster than freeze.
Environment
- GHC version used: 8.10.5
- LLVM version used: 12.0.1
Optional:
- Operating System: Arch Linux
- System Architecture: x64