Kill off seqRule; unboxing through seq#/evaluate

Background

In today's GHC we have a somewhat bogus built-in rewrite rule, that transforms seq# x s to (# s, x #) whenever x is evaluated in the sense of exprIsHNF. seq# is the primitive used to implement Control.Exception.evaluate: The idea behind this rule is that if at runtime x really is evaluated, then seq# x s will turn out to be a no-op.

Why do I call this rule "somewhat bogus?" seq# is currently meant to provide stronger re-ordering guarantees than other methods of evaluation, so forgetting that seq# is used can cause trouble. Consider this toy example:

strictPrint :: Show a => a -> IO ()
{-# OPAQUE strictPrint #-}
strictPrint x = print $! x

f :: Show a => a -> IORef a -> IO ()
{-# OPAQUE f #-}
f x r = do
  x' <- evaluate $! x
  writeIORef r x'
  strictPrint x'

The user almost certainly expects that if evaluation of x throws an exception, nothing will be written to r by f x r. But compiling this with ghc-9.8.1 -O yields the following simplified Core for f:

f = (\ (@a_aSz)
       ($dShow_aSA :: Show a_aSz)
       (x_aJV :: a_aSz)
       (r_aJW :: IORef a_aSz)
       (s_a12V :: GHC.Prim.State# GHC.Prim.RealWorld) ->
       case r_aJW `cast` <Co:2> :: ... of { GHC.STRef.STRef var#_a14A ->
       case GHC.Prim.writeMutVar#
              @GHC.Types.Lifted @GHC.Prim.RealWorld @a_aSz var#_a14A x_aJV s_a12V
       of s2#_a14C
       { __DEFAULT ->
       ((strictPrint @a_aSz $dShow_aSA x_aJV) `cast` <Co:2> :: ...)
         s2#_a14C
       }
       })
    `cast` <Co:16> :: ...

Notably, the simplified version of f does not directly evaluate x at all; the eval is dropped because the call to strictPrint will evaluate it later anyway. (In an alternate world where side effects like writeMutVar# hide strictness in their continuations, this problem is harder to observe but can easily come up inside of unsafePerformIO.)

Obstacles

So, I want to get rid of this rule! But doing so causes a regression in GHC's test suite for the following program:

-- T15226
import Control.Exception (evaluate)

-- Just in case Prelude.repeat changes for some reason.
import Prelude hiding (repeat)

-- We want to be sure that the compiler *doesn't* know that
-- all the elements of the list are in WHNF, because if it
-- does, GHC.Core.Opt.ConstantFold may erase the seq#'s altogether.
repeat :: a -> [a]
repeat a = res
  where res = a : res
{-# NOINLINE repeat #-}  -- Belt *and* suspenders

silly :: [Int] -> IO ()
silly = foldr go (pure ())
  where
    go x r = do
      x' <- evaluate x
      evaluate (x' + 3)  -- GHC should know that x' has been evaluated,
                         -- so this calculation will be erased entirely.
                         -- Otherwise, we'll create a thunk to pass to
                         -- evaluate.
      r

main :: IO ()
-- 10,000,000 repetitions take only a twentieth of a second,
-- but allocations go up dramatically if the result is not
-- known evaluated.
main = silly $ take 10000000 $ repeat 1

Since the simplifier knows that x' is evaluated, it has no trouble speculating the addition in go. But if we do not somehow get rid of the second call to evaluate/seq#, we are forced to allocate an I# box to give to seq#. And since the inner loop of this test case otherwise performs no allocation, that causes a big regression. (We could imagine cleaning this up in some later stage of compilation since the result of this particular call to seq# is entirely unused, but that needs not generally be the case.)

Proposal

So we would really like to re-write seq# (I# var) s to something that provides the same evaluation-order guarantees but does not require the potentially-pointless boxing-up with I#. One way I can imagine doing so is by introducing a new primop, perhaps named sequencedNop# or opaqueIdentity#, with type a_reppoly -> State# s -> (# State# s, a_reppoly #), which compiles to a no-op but can be used to ensure that

Construction of its argument is not re-ordered to a later point in the State# token thread or dropped altogether, and
Uses of its result are not re-ordered to an earlier point in the State# token thread.

Then we can transfrom seq# (I# var) s to case sequencedNop# var s of (# s', var' #) -> (# s', I# var' #), after which the I# box can be easily dropped if it is un-necessary. Conditions 1 and 2 above ensure that the re-ordering guarantees provided by the re-written form are as strong as those provided by the seq# form.

(bytestring also has a use for this primop in its deferForeignPtrAvailability function; it specifically needs to provide condition 2 for a value of type Addr#. If you look at its source code, you will see that bytestring hacks around the absence of such a primop with a very ugly lazy runRW# incantation.)

Tricky point: When performing the seq#/sequencedNop# rewrite for a constructor worker with strict fields, we must create seq# calls for those strict fields because the constructor worker needs to evaluate them.

Edited Jan 14, 2024 by Matthew Craven

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information