Fix a bug that can lead to noDuplicate# not working sometimes.
The symptom is that under some rare conditions when running in parallel, an unsafePerformIO or unsafeInterleaveIO computation might be duplicated, so e.g. lazy I/O might give the wrong answer (the stream might appear to have duplicate parts or parts missing). I have a program that demonstrates it -N3 or more, some lazy I/O, and a lot of shared mutable state. See the comment with stg_noDuplicatezh in PrimOps.cmm that explains the problem and the fix. This took me about a day to find :-(