Change the representation of the MVar blocked queue
The list of threads blocked on an MVar is now represented as a list of separately allocated objects rather than being linked through the TSOs themselves. This lets us remove a TSO from the list in O(1) time rather than O(n) time, by marking the list object. Removing this linear component fixes some pathalogical performance cases where many threads were blocked on an MVar and became unreachable simultaneously (nofib/smp/threads007), or when sending an asynchronous exception to a TSO in a long list of thread blocked on an MVar. MVar performance has actually improved by a few percent as a result of this change, slightly to my surprise. This is the final cleanup in the sequence, which let me remove the old way of waking up threads (unblockOne(), MSG_WAKEUP) in favour of the new way (tryWakeupThread and MSG_TRY_WAKEUP, which is idempotent). It is now the case that only the Capability that owns a TSO may modify its state (well, almost), and this simplifies various things. More of the RTS is based on message-passing between Capabilities now.