Eager AP_STACK blackholing causes incorrect size info for sanity checks

While debugging #15508 (closed) I found a case where eager blackholing in AP_STACK causes closure_sizeW() to return incorrect size, which in turn causes incorrect slop zeroing by OVERWRITING_CLOSURE(), which breaks sanity checks.

To reproduce, cd into testsuite/tests/concurrent/prog001, then:

$ ghc-stage2 Mult.hs -fforce-recomp -debug -rtsopts
$ ./Mult +RTS -DS
Mult: internal error: checkClosure: stack frame
    (GHC version 8.7.20180825 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
zsh: abort (core dumped)  ./Mult +RTS -DS

Here's how the problem occurs:

Allocate an AP_STACK in a generation during a GC.
Evaluate the AP_STACK. The entry code first WHITEHOLEs and then eagerly

BLACKHOLEs it. At this point size of the STACK becomes 2 because that's the

size of (eager or not) BLACKHOLE.

To start a GC the thread does threadPaused, which in line 342 actually

BLACKHOLEs the eager blackhole (is this part really correct?) and zeros the

slop, but because the eager blackhole has the same size as BLACKHOLE it

doesn't actually zero the stack frames in the original AP_STACK's payload.

In the next GC, in pre-GC sanity check we check the whole heap. When

checking the generation that the BLACKHOLE (the AP_STACK that became a

BLACKHOLE in step (2)) resides in we check the closure, and then check

closure + 2 (2 is the size of BLACKHOLE) instead of closure + <size of the stack>, and end up checking a stack frame of the original AP_STACK.

This causes the sanity check to fail because we don't expect to see a stack

frame outside of a stack.

In summary, normally when blackhole an object we zero the space after the blackhole (i.e. some part of the original object's payload) so that in sanity checks we can skip over that space, but we can't do this when eagerly blackholing (because the payload of the original object will be used) which causes sanity check failures.

Trac metadata

Trac field	Value
Version	8.5
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Runtime System
Test case
Differential revisions
BlockedBy
Related
Blocking
CC	bgamari, simonmar
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information