Skip to content

Using error in an impossible case makes code slower

Summary

I have spine-strict list and I define this (silly) operation on it:

data List a = Nil | Cons a !(List a) deriving Show

data Tup2 a b = Tup2 !a !b

fooA :: List a -> Tup2 (List a) (List a)
fooA xs = go xs xs
  where
    go Nil ys = Tup2 ys Nil
    go (Cons _ xs) ys = case ys of
      Nil -> error "impossible"
      Cons y ys' -> case go xs ys' of
        Tup2 s zs -> Tup2 s (Cons y zs)

Note the impossible case. I know one list cannot be exhausted before the other because they are the same list, so I mark the case as impossible.
I find that fooA takes ~3.7ms on a list of length 100000.

I then make a small change where I replace the error case with a value.

fooB :: List a -> Tup2 (List a) (List a)
fooB xs = go xs xs
  where
    go Nil ys = Tup2 ys Nil
    go (Cons _ xs) ys = case ys of
      Nil -> Tup2 Nil Nil -- impossible
      Cons y ys' -> case go xs ys' of
        Tup2 s zs -> Tup2 s (Cons y zs)

Now fooB takes ~1ms on the same input list!

Benchmark code
{-# LANGUAGE BangPatterns #-}
import Test.Tasty.Bench
import Control.DeepSeq

main :: IO ()
main = defaultMain
  [ bench "fooA" $ whnf fooA xs
  , bench "fooB" $ whnf fooB xs
  ]
  where
    !xs = replicateL 100000 ()

data List a = Nil | Cons a !(List a) deriving Show

data Tup2 a b = Tup2 !a !b

fooA :: List a -> Tup2 (List a) (List a)
fooA xs = go xs xs
  where
    go Nil ys = Tup2 ys Nil
    go (Cons _ xs) ys = case ys of
      Nil -> error "impossible"
      Cons y ys' -> case go xs ys' of
        Tup2 s zs -> Tup2 s (Cons y zs)
{-# NOINLINE fooA #-}

fooB :: List a -> Tup2 (List a) (List a)
fooB xs = go xs xs
  where
    go Nil ys = Tup2 ys Nil
    go (Cons _ xs) ys = case ys of
      Nil -> Tup2 Nil Nil -- impossible
      Cons y ys' -> case go xs ys' of
        Tup2 s zs -> Tup2 s (Cons y zs)
{-# NOINLINE fooB #-}

replicateL :: Int -> a -> List a
replicateL n x
  | n <= 0 = Nil
  | otherwise = Cons x (replicateL (n-1) x)

Result:

All
  fooA: OK
    3.67 ms ± 359 μs, 9.6 MB allocated, 3.6 MB copied,  23 MB peak memory
  fooB: OK
    990  μs ±  52 μs, 3.8 MB allocated, 692 KB copied,  23 MB peak memory

What causes this difference?

To add some context, I encountered this when attempting to define tails for a strict structure using something along the lines of \xs -> evalState (traverse (state unconsSure) xs) xs, where we know that unconsSure never gets an empty structure.

Steps to reproduce

Run the above code.

Expected behavior

I hope fooA can be as fast as fooB since the impossible case is better annotated in fooA.

Environment

  • GHC version used: 9.8.2
Edited by meooow
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information