Asynchronous exception rethrown synchronously inside runStmt
This bug is related to various bugs to do with asynchronous exceptions
http://hackage.haskell.org/trac/ghc/ticket/3997, http://hackage.haskell.org/trac/ghc/ticket/5902, as well as the request for a means to terminate a code snippet started with runStmt
http://hackage.haskell.org/trac/ghc/ticket/1381.
The only means to terminate such a snippet is to send an asynchronous exception to the thread that executed the runStmt
(while, in versions of ghc prior to Simon M's patch listed in 1381, making sure that sandboxing is disabled). The problem with this approach is that this exception might interrupt the *typechecker* instead of the snippet.
This is problematic, because a relatively large part of the typechecker runs inside a call to unsafeInterleaveIO
, through a call to forkM
or forkM_maybe
from
-
loadDecl
incompiler/iface/LoadIFace.lhs
[[BR]] -
tc_iface_decl
,tcIfaceDataCons
,tcIfaceInst
,tcIfaceFamInst
,tcIfaceRule
,tcIfaceVectInfo
,tcUnfolding
,tcIfaceWrapper
,tcPragExpr
,tcUnfolding
,tcIfaceWrapper
andtcPragExpr
incompiler/iface/TcIFace.lhs
[[BR]]
If the asynchronous exception is caught inside unsafeInterleaveIO
and rethrown synchronously (typically implicitly through a call to try
or catch
that only catches exceptions of a certain type -- specifically IOEnvFailure
in this case) then this exception becomes the value of the thunk and any subsequent attempt to poke that thunk will rethrow the exception.
The attached test case illustrates this. After some minimal set up, we repeatedly use runStmt
to execute a snippet which simply waits for 1 second. Before we do this, we fork a thread which waits a random delay between 0 and 0.1 seconds, and then sends an asynchronous exception to the main thread. We catch this exception, make sure it's the exception we're expecting, and repeat. On every iteration we throw a different exception; however, when we run the code we will get
Unexpected exception CountedThreadKilled 4
Expected (CountedThreadKilled 8)
or similar output. In other words, we are catching an *old* exception, that we threw and caught earlier. This must be because this exception has become the value of a thunk, either inside unsafePerformIO
or unsafeInterleaveIO
-- and so far the only candidate I have found is the unsafeInterleaveIO
inside forkM_maybe
.
So this leaves us with two questions:
- Are there other (relevant) places where
unsafePerformIO
orunsafeInterleaveIO
are used? I think the answer to this is is no, but I am not sure. - Where do we catch and rethrow exceptions?
Unfortunately, question 2 is more difficult to answer. The function tryM
(compiler/utils/IOEnv.hs
) wraps around try
, but it in turn gets wrapped in lots of places: recoverM
, recoverTR
, tryTcErrs
, checkNoErrs
(this one is used a lot), mapAndRecoverM
(also used frequently), tryTc
and runPlans
.
There is a call to tryM
directly inside the unsafeInterleaveIO
in forkM_maybe
, so that's an obvious candidate. However, as an experiment, I replaced the tryM
with tryAllM
(not really a solution, as discussed below) and re-ran the tests, and it was still failing in a similar way. This means that this tryM
cannot be the only culprit, because with this change forkM
will "re"throw all exceptions as an GhcException
, never as the original exception (certainly not as the custom exception type that the test is using). Since we were still getting our custom exception, this must have been rethrown elsewhere.
Unfortunately tryM
or one of its variants gets called in lots of places:
-
initDs
incompiler/deSugar/DsMonad.lhs
[[BR]] -
dataConInfoPtrToName
incompiler/ghci/DebuggerUtils.hs
[[BR]] -
addConstraint
,cvObtainTerm
,cvReconstructType
andcongruenceNewtypes
incompiler/ghci/RtClosureInspect.hs
[[BR]] -
tcPolyBinds
,tcPolyInfer
,tcSpecPrags
,tcImpPrags
andtcTySigs
incompiler/typecheck/TcBinds.lhs
[[BR]] -
tcClassDecl2
incompiler/typecheck/TcClassDcl.lhs
[[BR]] -
check_instance
incompiler/typecheck/TcDefaults.lhs
[[BR]] -
tcDeriving
,makeDerivSpecs
andinferInstanceContexts
incompiler/typecheck/TcDeriv.lhs
[[BR]] -
tc_hs_type
incompiler/typecheck/TcHsType.lhs
-
tcInstDecls1
,tcClsInstDecl
andtcInstDecl2
incompiler/typecheck/TcInstDcls.lhs
[[BR]] -
tcRnExtCore
,tc_rn_src_decls
,rnTopSrcDecls
,tcUserStmt
,tcGhciStmts
,lookup_rdr_name
incompiler/typecheck/TcRnDriver.lhs
[[BR]] -
initTc
incompiler/typecheck/TcRnMonad.lhs
[[BR]] -
tcTopSplice
,tcTopSpliceExpr
,tcTopSpliceType
,runMeta
,qRecover
,reifyInstances
incompiler/typecheck/TcSplice.lhs
[[BR]] -
tcTyAndClassDecls
,tcTyClGroup
andcheckValidTyCon
incompiler/typecheck/TcTyClsDecls.lhs
Figuring out which of these may be called inside the forkM
calls is tricky. I tried running the test with a profiled built of ghc
with -fprof-auto
enabled, but unfortunately I was unable to reproduce a test failure with that approach, I'm not sure why.
But even if we did find the culprits, fixing it will not be easy because there is no generic solution. In forkM_maybe
we could return Nothing
when we catch an asynchronous exception, but then we would call pgmError
inside forkM
thereby rethrowing the asynchronous exception as a synchronous GhcException
. That wouldn't solve anything.
Instead, we'd have to rethrow the asynchronous exception using throwTo
, but then we need to decide what to do after the throwTo
returns, and that might be a rather difficult question to answer generically for all of the typechecker. Can we just re-run the thing_inside
? Probably not, there might be all kinds of side effects; and checking for all those side effects would be quite a large task, esp given that there are so many places (list above) where we are caling tryM
(directly through forkM
or deeper down the call stack).
I think a more realistic solution is to provide a means to detect when the snippet starts running. I outlined such a solution at http://www.haskell.org/pipermail/ghc-devs/2013-June/001380.html, but Simon M objected that this proposal might not work because there is no clear point where type checking finishes and the snippet starts executing. So far I think it would probably be easier to make sure that such a point *does* exist than to fix the asynchronous exception problem. (Note that I have adopted the solution outlined in my email and it seems to work for me, but since this bug is non-deterministic that of course doesn't mean it actually works..)
However, I've run out of time to work on this for now. Might be continued.
Trac metadata
Trac field | Value |
---|---|
Version | 7.6.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |