Optimization: Shift dropped list heads by coeffecient to prevent thunk generation
Consider the following snippet(s) equivalent to ([a..b] !! n), the source of (!!) and the source of drop:
normal_list :: Int -> Int
normal_list n = head $ drop n [a..b]
shifted_list :: Int -> Int
shifted_list n = head $ drop (n-n) [(a+n)..b]
xs !! n | n < 0 = undefined
[] !! _ = undefined
(x:_) !! 0 = x
(_:xs) !! n = xs !! (n-1)
drop n xs | n <= 0 = xs
drop _ [] = []
drop n (_:xs) = drop (n-1) xs
Notice the (_:xs) matching in these functions as a result of WHNF.
In the first case, normal_list, thunks are generated for x in (x_:xs) of the target list and overhead is seen in the pattern matching/guard of n in drop.
In the second case, shifted_list, this overhead can be completely removed by adding a coefficient such that the list starts at the programmatically defined lower bound, a, plus the known fact that the head is dropped n times.
Hence, given the example above, consider:
[x * x + 3 | x <- [1..]] !! n
-- versus
[x * x + 3 | x <- [(1+n)..]] !! (n-n)
-- which is optimized into
[x * x + 3 | x <- [(n+1)..]] !! 0
-- which is effectively
head [x * x + 3 | x <- [(n+1)..]]
The operation is turned from O(n) into O(1).
Consider benchmark proving GHC 7.4.2 does not make this optimization under -O2:
import Criterion.Main
normal_list :: Int -> Int
normal_list n = head $ drop n [1..]
shifted_list :: Int -> Int
shifted_list n = head $ drop (n-n) [(1+n)..]
main = defaultMain
[ bench "normal_list 1000" $ whnf normal_list 1000
, bench "shifted_list 1000" $ whnf shifted_list 1000
]
C:\Users\Kyle\Desktop>ghc -O2 listco.hs
[1 of 1] Compiling Main ( listco.hs, listco.o )
Linking listco.exe ...
C:\Users\Kyle\Desktop>listco.exe
warming up
estimating clock resolution...
mean is 4.644044 us (160001 iterations)
found 319255 outliers among 159999 samples (199.5%)
159256 (99.5%) low severe
159999 (100.0%) high severe
estimating cost of a clock call...
mean is 310.3118 ns (34 iterations)
benchmarking normal_list 1000
Warning: Couldn't open /dev/urandom
Warning: using system clock for seed instead (quality will be lower)
mean: 7.352463 us, lb 7.058339 us, ub 7.646574 us, ci 0.950
std dev: 1.478087 us, lb 1.478066 us, ub 1.478200 us, ci 0.950
variance introduced by outliers: 94.651%
variance is severely inflated by outliers
benchmarking shifted_list 1000
mean: 46.42819 ns, lb 45.44244 ns, ub 47.21689 ns, ci 0.950
std dev: 4.495757 ns, lb 4.035428 ns, ub 4.832396 ns, ci 0.950
variance introduced by outliers: 77.960%
variance is severely inflated by outliers
Trac metadata
Trac field | Value |
---|---|
Version | 7.4.2 |
Type | FeatureRequest |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |