Kill off seqRule; unboxing through seq#/evaluate
Background
In today's GHC we have a somewhat bogus built-in rewrite rule, that transforms seq# x s
to (# s, x #)
whenever x
is evaluated in the sense of exprIsHNF
. seq#
is the primitive used to implement Control.Exception.evaluate
: The idea behind this rule is that if at runtime x
really is evaluated, then seq# x s
will turn out to be a no-op.
Why do I call this rule "somewhat bogus?" seq#
is currently meant to provide stronger re-ordering guarantees than other methods of evaluation, so forgetting that seq#
is used can cause trouble. Consider this toy example:
strictPrint :: Show a => a -> IO ()
{-# OPAQUE strictPrint #-}
strictPrint x = print $! x
f :: Show a => a -> IORef a -> IO ()
{-# OPAQUE f #-}
f x r = do
x' <- evaluate $! x
writeIORef r x'
strictPrint x'
The user almost certainly expects that if evaluation of x
throws an exception, nothing will be written to r
by f x r
. But compiling this with ghc-9.8.1 -O
yields the following simplified Core for f
:
f = (\ (@a_aSz)
($dShow_aSA :: Show a_aSz)
(x_aJV :: a_aSz)
(r_aJW :: IORef a_aSz)
(s_a12V :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case r_aJW `cast` <Co:2> :: ... of { GHC.STRef.STRef var#_a14A ->
case GHC.Prim.writeMutVar#
@GHC.Types.Lifted @GHC.Prim.RealWorld @a_aSz var#_a14A x_aJV s_a12V
of s2#_a14C
{ __DEFAULT ->
((strictPrint @a_aSz $dShow_aSA x_aJV) `cast` <Co:2> :: ...)
s2#_a14C
}
})
`cast` <Co:16> :: ...
Notably, the simplified version of f
does not directly evaluate x
at all; the eval is dropped because the call to strictPrint
will evaluate it later anyway. (In an alternate world where side effects like writeMutVar#
hide strictness in their continuations, this problem is harder to observe but can easily come up inside of unsafePerformIO
.)
Note: this particular example isn't directly relevant any more, because we no longer drop evals on a variable that is subsequently evaluated. But this comment below gives a more convincing example.
Obstacles
So, I want to get rid of this rule! But doing so causes a regression in GHC's test suite for the following program:
-- T15226
import Control.Exception (evaluate)
-- Just in case Prelude.repeat changes for some reason.
import Prelude hiding (repeat)
-- We want to be sure that the compiler *doesn't* know that
-- all the elements of the list are in WHNF, because if it
-- does, GHC.Core.Opt.ConstantFold may erase the seq#'s altogether.
repeat :: a -> [a]
repeat a = res
where res = a : res
{-# NOINLINE repeat #-} -- Belt *and* suspenders
silly :: [Int] -> IO ()
silly = foldr go (pure ())
where
go x r = do
x' <- evaluate x
evaluate (x' + 3) -- GHC should know that x' has been evaluated,
-- so this calculation will be erased entirely.
-- Otherwise, we'll create a thunk to pass to
-- evaluate.
r
main :: IO ()
-- 10,000,000 repetitions take only a twentieth of a second,
-- but allocations go up dramatically if the result is not
-- known evaluated.
main = silly $ take 10000000 $ repeat 1
Since the simplifier knows that x'
is evaluated, it has no trouble speculating the addition in go
. But if we do not somehow get rid of the second call to evaluate
/seq#
, we are forced to allocate an I#
box to give to seq#
. And since the inner loop of this test case otherwise performs no allocation, that causes a big regression. (We could imagine cleaning this up in some later stage of compilation since the result of this particular call to seq#
is entirely unused, but that needs not generally be the case.)
Proposal
So we would really like to re-write seq# (I# var) s
to something that provides the same evaluation-order guarantees but does not require the potentially-pointless boxing-up with I#
. One way I can imagine doing so is by introducing a new primop, perhaps named sequencedNop#
or opaqueIdentity#
, with type a_reppoly -> State# s -> (# State# s, a_reppoly #)
, which compiles to a no-op but can be used to ensure that
- Construction of its argument is not re-ordered to a later point in the
State#
token thread or dropped altogether, and - Uses of its result are not re-ordered to an earlier point in the
State#
token thread.
Then we can transfrom seq# (I# var) s
to case sequencedNop# var s of (# s', var' #) -> (# s', I# var' #)
, after which the I#
box can be easily dropped if it is un-necessary. Conditions 1 and 2 above ensure that the re-ordering guarantees provided by the re-written form are as strong as those provided by the seq#
form.
(bytestring
also has a use for this primop in its deferForeignPtrAvailability
function; it specifically needs to provide condition 2 for a value of type Addr#
. If you look at its source code, you will see that bytestring
hacks around the absence of such a primop with a very ugly lazy runRW#
incantation.)
Tricky point: When performing the seq#
/sequencedNop#
rewrite for a constructor worker with strict fields, we must create seq#
calls for those strict fields because the constructor worker needs to evaluate them.