Add a broader set of (C/CMM-based) atomic memory operations

This task is a precursor to ticket #7883 (closed). The goal is to standardize a broader set of atomic ops -- mainly CAS and fetch-and-add -- and then in #7883 (closed) we will optimize those primops to have native support in the LLVM backend and thus avoid the extra C calls.

Currently, the goal is only to support word-sized (4 or 8 byte) compare-and-swap and fetch-and-add operations. In the future, this goal could be expanded to support 16-byte CAS, implemented by CMPXCHG16B on x86.

Even supporting word sized CAS requires three different primops for operating on MutVar, MutableArray, and MutableByteArray, respectively. Here are signatures for those primOps:

gcptr stg_casMutVarzh ( gcptr mv, gcptr old, gcptr new )
 /* MutVar# s a -> a -> a -> State# s -> (# State#, Int#, Any a #) */

gcptr stg_casArrayzh ( gcptr arr, W_ ind, gcptr old, gcptr new );
/* MutableArray# s a -> Int# -> a -> a -> State# s -> (# State# s, Int#, Any a #) */

W_ stg_casIntArrayzh( gcptr arr, W_ ind, W_ old, W_ new );
/* MutableByteArray# s -> Int# -> Int# -> Int# -> State# s -> (# State# s, Int# #) */

Fetch-and-add, on the other hand, only makes sense on a MutableByteArray:

W_ stg_fetchAddIntArrayzh( gcptr arr, W_ ind, W_ incr );
/* MutableByteArray# s -> Int# -> Int# -> State# s -> (# State# s, Int# #) */

Some systems, such as gcc and LLVM, provide other variants, such as "fetch-and-multiply". However, these do not have direct support on x86 and are usually just simulated with CAS. For example, see the LLVM documentation here. User-exposed Haskell libraries for atomic operations may employ the same approach using the CAS primOps above to simulate other fetch-and-X operations.

Finally, a design question which bears thinking about is how to implement opaque "ticket" approach to CAS. The basic idea is that a user does NOT present a regular pointer-based data value (e.g. Int) to CAS as the "old" value. Rather, observations of prior values are kept in an opaque form (Any type) to prevent the compiler changing the pointer identity by, for example, unboxing and reboxing the value.

The way this affects the **primOp** design is as follows. The current implementation of casMutVar# and casArray# returns the NEW value rather than the old one if the CAS is successful. The Haskell code that imports the primOps treats this new value as the opaque ticket.

The user can in fact safely retrieve the real value from the Any-based "ticket" with a peekTicket :: Ticket a -> a operation. But peekTicket MUST be marked NOINLINE to hide the unsafeCoerce operation inside it from compiler optimizations. Otherwise, the code path that carries the ticket from one CAS operation to the next is subject to optimization [1]. In particular, unsafeCoerce simply becomes a type-level operation, not a function call with the usual constraints on control and dataflow.

Returning to the issue of primOp design, if we wished to avoid the somewhat non-standard design of returning the new value from CAS, we //could// allow users to create Ticket values themselves, rather retrieving them as the result of previous read and CAS IO operations. Yet this would entail an additional NOINLINE (unsafeCoerce) function call, and I believe it would be prone to error.

Further, if you look at the primOp implementation for stg_casMutVarzh, you will see that there is //already// a branch, which is necessary for updating GC-related meta-data. Thus there is no additional overhead to return the new rather than the old value from CAS operations in the successful case. Haskell-level CAS is already a compound operation, not merely a single machine code instruction.

By contrast, the C intrinsic __sync_val_compare_and_swap can become only a single machine operation, and it returns the "old" value irrespective of whether the CAS succeeded or failed. In fact, an equality comparison on that old value is necessary to determine if the operation succeeded. In contrast, the Haskell CAS primOps above return an unboxed tuple with an explicit success bit. Thus returning the old value in the success case is completely redundant, and returning the new value instead allows us to acheive two goals (the CAS and the opacifying coercion) with one operation at no additional cost.

[1] P.S. As a concrete example of what happens when unsafeCoerce is not hidden, consider the following code as a starting point:

x = unsafeCoerce y 
z = f y

x would seem to have nothing to do with f, and yet later in the compiler, one can end up with Core like this:

z = f x

This exposes the type information and activates optimizations, which was the source of one bug with the lockfree-queue package that was fixed by adding NOINLINE to peekTicket.

Trac metadata

Trac field	Value
Version	7.6.3
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information