Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,262
    • Issues 4,262
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 405
    • Merge Requests 405
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Wiki
    • Proposal
  • with combinator

Last edited by Ben Gamari Oct 20, 2020
Page history New page

with combinator

Replacing touch# with the keepAlive# combinator

There are a collection of related tickets;

  • #17760 is the one about keepAlive#
  • !2961 (closed) (a first attempt) has useful commentary
  • !3131 (closed) is a second attempt, dependent on !3113 (closed) (runRW)
  • #17746, #18061 (closed): withForeignPtr is unsafe
  • #18040 is a bit similar: swapping unsafe operations with touch#
  • #13104 is about runRW# and join points
  • #15127 (closed) is similar, also runRW#

Background

Today GHC offers touch# to ensure object liveness in the presence of references living outside of the heap:

touch# :: a -> State# RealWorld -> State# RealWorld

This is often useful when passing GC-managed objects (typically pinned byte arrays) to foreign functions through foreign data structures. For instance, the POSIX vectored I/O interface exposes this interface:

struct iovec {
    void  *iov_base;    /* Starting address */
    size_t iov_len;     /* Number of bytes to transfer */
};

ssize_t readv(int fd, struct iovec iovs[iov_count], int iov_count);

We may, for instance, wish to populate iov_base with a pinned ByteArray:

data IoVector = IoVector { iov_base :: Addr, iov_len :: CSize }
instance Storable IoVector

foreign import ccall "readv" c_readv
    :: CInt -> Ptr IoVector -> CInt -> IO CSize

doRead :: Fd -> [IoVector] -> IO CSize
doRead fd vecs = withArray vecs $ \vecsPtr -> readv fd vecsPtr (length vecs)

read :: Fd    -- ^ file descriptor from which to read
     -> CSize -- ^ length to read
     -> IO ByteArray
read fd len = do
  arr <- newPinnedByteArray 42
  result <- doRead fd [IoVector (byteArrayContents arr) len]
  touch# arr

However, this is dangerously susceptible to being dropped by the simplifier. For instance, if doRead were change such that the simplifier concluded that it will fail to return (e.g. because it is of the form forever ...), then it is tempted to drop the continuation containing touch#. This results in the garbage collector inappropriately freeing arr, resulting in catastrophe.

It caused #14346 (closed) (allocaBytes and allocaBytesAligned) and #17746 (withForeignPtr).

Example without FFI nor ForeignPtr

foo :: ByteArray# -> IO ()
foo ba = do
  assert (isPinnedByteArray# ba)
  addr <- byteArrayContents# ba
  forever $ do
    bar addr
  touch# ba

Mitigation

A way to mitigate the issue is to ensure that functions using touch# can't be simplified by using NOINLINE pragmas.

  • #14346 (closed) has been fixed this way in 8.2 and 8.6 by adding a NOINLINE pragma to allocaBytes[Aligned] functions (cf 56590db0).

  • #17746: adding a NOINLINE pragma to withForeignPtr fixes the issue but the price in performance is very high. An alternative is to rewrite as follows so that only the second field of the ForeignPtr is allocated and kept alive (instead of the whole ForeignPtr). Sadly it still has a huge impact on performance metrics (withForeignPtr is used a lot, especially in ByteString implementation).

withForeignPtr fo io
  = let !(ForeignPtr addr r) = fo
        IO fio               = io (Ptr addr)
    in IO $ \s -> keepAlive# r fio s

{-# NOINLINE keepAlive# #-}
keepAlive# :: a -> (State# RealWorld -> (# State# RealWorld, b #)) -> State# RealWorld -> (# State# RealWorld, b #)
keepAlive# a m s =
  case m s of
    (# s', r #) -> (# touch# a s', r #)

Fixing the issue properly with a new keepAlive# primop?

To fix the issue it has been proposed (#14375, !2566 (closed)) to make the keepAlive# function above a primop of this form:

keepAlive# :: forall (r :: RuntimeRep) (a :: TYPE r) b.
         b
         -- ^ the value to preserve
      -> (State# s -> (# State s, a #))
         -- ^ the continuation in which the value will be preserved
      -> State# s -> (# State# s, a #)

keepAlive# a k evaluates k a, ensuring that a remains alive throughout the evaluation.

If we rewrite the test example above with keepAlive#, we get:

test :: IO ()
test = do
    arr <- newPinnedByteArray 42
    keepAlive# arr (unIO (doSomething (byteArrayContents arr)))

This construction the compiler can't mangle as there is no continuation to drop.

Avoiding allocations with keepAlive#

It turns out that touch# is used quite widely, especially as part of withForeignPtr. Moreover, many of these uses are extremely computationally light. For instance, consider GHC.IO.Buffer.readWord8Buf:

readWord8Buf :: RawBuffer Word8 -> Int -> IO Word8
readWord8Buf arr ix = withForeignPtr arr $ \p -> peekByteOff p ix

This is small enough to be inlined and will therefore be typically compiled to a single memory read instruction when used in a strict context. While it is trivial to implement withForeignPtr in terms of keepAlive#, it is very important that we do not add overhead to this sort of usage.

Unfortunately, the naive implementation of withForeignPtr in terms of keepAlive# will prevent GHC from eliminating the allocation of an W8# in a strict usage of readWord8Buf. For instance, consider a program like this:

strlen :: RawBuffer Word8 -> IO Int
strlen = go 0
  where
    go !n !buf = do
      c <- readWord8Buf buf n
      if c == 0 then return n else go (n+1) buf

With today's withForeignPtr (implemented in terms of the fragile touch#) we get a very tight loop for go's worker with no allocations in the hot part of the loop. However, with an implementation based upon keepAlive# we will end up with something like the following:

$wgo :: Int# -> ForeignPtr Word8 -> IO Int
$wgo n# buf = IO $ \s0 -> 
    case keepAlive# buf (\s1 -> 
           case readWord8OffAddr# buf 0# of { Unit# n# -> W8# n# }
         ) of
     W8# n# ->
       case n# of
         0# -> ...
         _  -> ...

That is, while previously case-of-known constructor was able to eliminate the allocation of the W8# intermediate, with keepAlive# we can no longer do so. To fix this we need to somehow push the outer case (scrutinizing the Word8) into the continuation of keepAlive#. We recently implemented precisely this optimisation for runRW# in #15127 (closed). We will need to do similarly for keepAlive#. We call this optimisation strict-context float-in.

Note that for this optimisation to be possible keepAlive# must be polymorphic in the levity of the result.

$wgo :: Int# -> ForeignPtr Word8 -> IO Int
$wgo n# buf = IO $ \s0 -> 
    keepAlive# buf s0 (\s1 -> 
       case (
          case readWord8OffAddr# buf 0# of { Unit# n# -> W8# n# }
       ) of 
         W8# n# ->
           case n# of
             0# -> ...
             _  -> ...
    )

After inlining keepAlive#:

$wgo :: Int# -> ForeignPtr Word8 -> IO Int
$wgo n# buf = IO $ \s0 ->
     case (
       case readWord8OffAddr# buf 0# of { Unit# n# ->
         case n# of
           0# -> (# s0, ... #)
           _  -> (# s0, ... #)
     ) of 
       (# s1, r #) -> 
         case touch# buf s1 of
           s2 -> (# s2, r #)

Wrinkles

Unfortunately, there are a few wrinkles... consider this program, derived from T4978:

writeN =
  ...
    case keepAlive# x s0 (\s1 -> something s1) of
      (# s2, x #) ->
        writeN ...

Note how the recursive writeN call here is in tail-call position. If we push the case analysis into the keepAlive# here we end up with:

writeN = 
  ...
    keepAlive# x s0 (\s1 ->
      case something s1 of (# s2, x #) ->
        writeN ...

Which, when we inline the occurrence of keepAlive# will turn into:

writeN = 
  ...
    case
      case something s0 of (# s2, x #) ->
        writeN ...
    of
      (# s3, r #) ->
        case touch# s3 of s3 -> (# s3, r #)

In performing the strict-context float-in transformation we have turned a tail-call into a non-tail-call. This is quite unfortunate as touch# won't even produce any code.

Avoiding this while still retaining the ability to eliminate construction/destruction around keepAlive# is rather tricky. One approach is to split each context into two pieces: one which can be floated into the keepAlive# and the other which will remain outside. For former can safely contain case analyses of trivial scrutinees, let expressions whereas the all applications must remain outside.

Unfortunately, implementing this proposal is quite non-trivial, requiring that we perform rather significant reorganization of the source program.

Option A: Naive code generation of keepAlive#

The next question is what code should keepAlive# produce.

One simple strategy would be to lower keepAlive# as an out-of-line primop of the form:

// the out-of-line entry code for keepAlive#:
keepAlive#_entry(closure, cont) {
    W_[Sp-0] = closure;
    W_[Sp-8] = keepAlive#_ret;
    Sp -= 16;
    ENTER(cont);
}

// the return code for the stack frame pushed by keepAlive#:
keepAlive#_ret() {
    Sp += 16;
}

However, this is suboptimal as it requires that the continuation be stack allocated.

Option B: A slightly better code generation strategy

One way to eliminate the cost added by keepAlive# is to rewrite it to the existing touch# primitive (which has no runtime cost) late enough in the compilation process that the simplifier can't drop it (avoiding #14375). For instance, we might give keepAlive# a similar treatment to that of runRW#, which is lowered in CorePrep to an application of realWorld#.

Under such a scheme, CorePrep would look for applications of the form keepAlive# x k s and rewrite them to,

case k s of (# s', y #) ->
case touch# x s' of s'' ->
(# s'', y #)

With out current STG-to-STG pipeline the touch# here should have the desired effect of keeping x alive until k has finished.

However, there is a problem here: we have introduced a levity-polymorphic binder, y. This is disallowed by Core. That being said, operationally everything is fine as touch# is a no-op during code generation.

Also we'd want

K[ keepAlive x (\s.e) ]]  -->   keepAlive x (\x. K[[e]])

rather like runRW.

Option C: Improved code generation for continuation primops

It turns out that several existing primops (e.g. catch#, atomically#) have a continuation-passing structure similar to that of keepAlive# and suffer from this same suboptimal code generation. Consequently, there have been a few proposals (largely orthgonal to the present keepAlive# proposal) on improving the state of code generation for these operations.

Issue #16098 proposes an strategy (with an implementation in !2567) for improving code generation for the above-mentioned continuation-passing primops. Specifically, we extend STG to allow the continuation arguments of such primops to non-ANF form. For instance, keepAlive# x (\s -> expr) would be a valid application.

The STG-to-Cmm pass could then lower this application by emitting code to push a catch stack-frame, then proceed immediately to emit the code for the continuation.

Producing this lowering in STG-to-Cmm is a bit tricky since we work exclusively with abstract stack areas. We essentially need to treat the keepAlive# frame as an update frame, pushing it above everything else in the Old stack region. However it appears that this may break invariants within the code generator.

Option D: A better improved code generation approach

While discussing the proposal in #16098 on IRC, @AndreasK raised the concern that dropping ANF will inevitably complicate passes that work with STG. He suggested this alternative:

We introduce a magic primop, used only in STG:

pushKeepAlive# :: forall (r :: RuntimeRep) (a :: TYPE r).
             a -> State# s -> State# s

In Core-to-STG we rather lower keepAlive# x cont as:

case pushKeepAlive# x of () { _ ->
  let join j = cont
  in jump j
}

Finally, the STG-to-C-- pass will have special logic for lowering case analyses on pushKeepAlive#, essentially unfolding its C-- definition into the use-site. Such a case analysis will result in C-- like:

// Push a pushWith_ret stack frame carrying a reference to `x`
W_[Sp] = stg_pushWith_ret;
W_[Sp+WDS(1)] = R1;

// Jump to the continuation `cont`
jump j;

The return code for the stg_pushWith_ret frame would be defined in the RTS thusly:

stg_pushWith_ret() {
    Sp += WDS(2);  // Pop the pushWith frame
    return;        // Return to the next frame
}

pushing the same keepAlive#_ret stack frame seen above.

This gives us the same efficient code generation without losing the advantages offered by ANF (albeit with the disadvantage of weakening the separation of concerns provided by the STG abstraction).

Option E: tag Core case-expression with kept-alive variables

Suppose we have this Haskell primitive:

keepAlive# :: a -> b -> b

And that it gets desugared into the following Core:

keepAlive# (Var k) b ===> case {k} b of b' -> { DEFAULT -> b' }
keepAlive# a       b ===> compile time error

Where {k} in case {k} scrut of ... is a set of variable names to keep alive while evaluating scrut. Let's call it the keep-alive set.

The semantics is clear: variables in the keep-alive set must be kept alive until the scrutinee is evaluated.

Compilation is straightforward:

  • STG: keep-alive sets are carried into STG's case-expressions exactly like in Core
  • Cmm: keep-alive sets are desugared into touch# pseudo-instructions placed just after case scrutinee evaluation code
  • CmmToAsm: we drop touch# pseudo-ops as we currently do

Advantages of this approach:

  1. most optimizations (case-of-case, case-of-known constructor, w/w...) can work as before. BUT we need to tweak them a little to take keep-alive sets into account (see below)
  2. we don't have to deal with a continuation as would be the case if we kept keepAlive# as a magic function in Core
  3. performance is as good as it is today (no stack frame, most optimizations still work, cf (1)
  4. most changes to the compiler are located into GHC.Core.Opt.*.

A drawback of this approach is that Core transformations have to be reviewed to check that they are still valid when we take keep-alive sets into account. BUT we would probably have to do this anyway for other approaches because we really really want to perform some Core transformations over keepAlive#.

Let's review some analyzes and transformations:

  1. freevars: freeVars(case {k} scrut of alts) = k + freeVars(case scrut of alts)

  2. case-of-case

case {k} (case {k2} x of .. -> a; ... -> b) of
  C0 .. -> e1
  C1 .. -> e2

===>

case {k,k2} x of
   ... -> case {k} a of { C0 .. -> e1; C1 ... -> e2 }
   ... -> case {k} b of { C0 .. -> e1; C1 ... -> e2 }
  1. case-of-known-constructor (or anything in WHNF)
case {k} whnf of ... ===> case whnf of ...
  1. Note that we can't discard case-expressions with non-empty keep-alive sets. E.g.
case {k} x of e { DEFAULT -> alt }  ====/===> let e = x in alt

Option F: noDiv

Another way to avoid #17760 (and implement the keepAlive interface described above) is to introduce a primitive to hide the fact that an expression may diverge from the simplifier.

Specifically, we introduce a magic operation:

noDiv :: forall (r :: RuntimeRep) (a :: TYPE r).
         (State# RealWorld -> a) -> State# RealWorld -> a
noDiv k s0 =
    case {- forgetting divergence -} k s0 of r -> r

Semantically this is an identity. However, it carries the caveat that noDiv f s will evaluate the result of f s ignoring the fact that it may diverge. The State token here ensures that no effects can be floated outside of the noDiv scope.

This primitive allows one to implement keepAlive# as follows:

keepAlive# :: forall a r.
              a
           -> (State# RealWorld -> (# State# RealWorld, r #))
           -> State# RealWorld
           -> (# State# RealWorld, r #)
keepAlive# x f s0 =
  case noDiv f s0 of
    (# s1, r #) ->
      case touch# x s1 of
        s2 -> (# s2, r #)
{-# INLINE keepAlive #-}

Note the INLINE pragma here; this is perfectly safe since we rely entirely on noDiv for the safety of this operation.

However, there is a wrinkle: the simplification described in the "Avoiding allocations with keepAlive#" section above poses a problem here. Let's again consider the case of strlen's worker written in terms of keepAlive#:

$wgo :: Int# -> ForeignPtr Word8 -> IO Int
$wgo n# buf = IO $ \s0 -> 
    case keepAlive# buf (\s1 -> 
           case readWord8OffAddr# buf 0# of { Unit# n# -> W8# n# }
         ) of
     W8# n# ->
       case n# of
         0# -> ...
         _  -> ...

Inlining keepAlive we get:

$wgo :: Int# -> ForeignPtr Word8 -> IO Int
$wgo n# buf = IO $ \s0 -> 
    case
      (case
         noDiv s0 (\s1 -> case readWord8OffAddr# buf 0# of { Unit# n# -> W8# n# })
       of
         (# s1, r #) -> case touch# x s1 of s2 -> (# s2, r #)
      )
    of
      W8# n# ->
        case n# of
          0# -> ...
          _  -> ...

Here we see a problem: the strict-context float-in transform will push the context enclosing the noDiv,

case 
  _
of
  (# s1, r #) -> case touch# x s1 of s2 -> (# s2, r #)

into the noDiv. Consequently the touch# will end up inside the noDiv. At this point the demand analyser may conclude that the touch# is unreachable and we end up back in the situation we are trying to avoid.

Option G: noDivFinal

Here we try to fix up the issues of Option F by moving the "final" continuation (e.g. the touch#) into a separate argument of noDiv. We call this operation noDivFinal:

noDivFinal :: forall (r :: RuntimeRep) (a :: TYPE r).
              (State# RealWorld -> State# RealWorld)
           -> (State# RealWorld -> (# State# RealWorld, a #))
           -> State# RealWorld
           -> (# State# RealWorld, a #)
noDivFinal k final s0 =
    case {- forgetting divergence -} k s0 of (# s1, r #) ->
    case final s1 of s2 ->
    (# s2, r #)

The definition here will be inlined during CorePrep.

Errata: Skip everything below. This doesn't work for the same reason that Option B doesn't work: it requires a levity polymorphic binder, r.

With this we can implement keepAlive# as:

keepAlive# :: forall a r.
              a
           -> (State# RealWorld -> (# State# RealWorld, r #))
           -> State# RealWorld
           -> (# State# RealWorld, r #)
keepAlive# x f s0 =
    noDivFinal (touch# x) f s0

Here we can freely float strict-contexts into the first

Option H: Disabling some dead-code elimination

Given the issues seen in the above, perhaps another option would be to rather give keepAlive# a normal Haskell definition (written in terms of touch#) and allow it to inline only late in simplification. We would then disable dead code elimination (in rebuildCall) during this pass.

Specifically, write:

keepAliveLifted#                                                     
    :: forall (arg :: TYPE 'LiftedRep) r.                            
       arg -> (State# RealWorld -> (# State# RealWorld, r #))        
    -> State# RealWorld                                              
    -> (# State# RealWorld, r #)                                     
keepAliveLifted# x k s0 =                                            
    case k s0 of                                                     
      (# s1, r #) ->                                                 
        case touch# x s1 of                                          
          s2 -> (# s2, r #)                                          
{-# INLINE [0] keepAliveLifted# #-}                                  

along with an analogous unlifted version (this specialisation is necessary to allow the function to be written in pure Haskell; ofterwise we would need to bind a levity-polymorphic binder, r).

We can then disable the case-of-bottom transform in phase 0 of the simplifier.

Clone repository

GHC Home
GHC User's Guide

Joining In

Newcomers info
Mailing Lists & IRC
The GHC Team

Documentation

GHC Status Info
Working conventions
Building Guide
Debugging
Commentary

Wiki

Title Index
Recent Changes