Asynchronous exceptions can put the external interpreter into an unrecoverable state
We run into this often in
ghcide, since it uses asynchronous exceptions liberally to cancel previously running compiles when a new edit comes in that invalidates it.
The culprit is the following excerpts from
-- | Send a 'Message' and receive the response from the iserv process iservCall :: Binary a => IServInstance -> Message a -> IO a iservCall iserv msg = remoteCall (iservPipe iserv) msg `catch` \(e :: SomeException) -> handleIServFailure iserv e handleIServFailure :: IServInstance -> SomeException -> IO a handleIServFailure iserv e = do let proc = iservProcess iserv ex <- getProcessExitCode proc case ex of Just (ExitFailure n) -> throwIO (InstallationError ("ghc-iserv terminated (" ++ show n ++ ")")) _ -> do terminateProcess proc _ <- waitForProcess proc throw e
On any exception,
handleIServFailure terminates the
ghc-iserv process. This wouldn't be so bad (though not ideal), except for this code in
(... restore $ action iserv') `MC.onException` (liftIO $ putMVar mIServState (IServRunning iserv'))
The exception handler installed here restores the
IServRunning state, which is doesn't match the actual state of the process. This mean that the next time the external interpreter is needed, ghc will happily try to read/write to it, resulting in an synchronous exception being thrown this time.
I tried to work around this in
ghcide by setting a
createIservProcessHook that forks a thread that uses
waitForProcess to wait for
ghc-iserv to die, and then writes
IServPending in HEAD) into the
MVar. However, this turned out to not be enough. GHC would indeed restart the
iserv process after this, but evaluating anything on it would lead to segfaults. This turned out to be because the
PersistentLinkerState in the
DynLinker didn't match the state of
iserv process, since when the
iserv process is restarted, it has nothing linked into it.
Finally, I modified the hook to clear the
DynLinker state also, ending up with:
createIServHook cp = do traceIO $ "spawning iserv: " ++ show cp a@(_,_,_,ph) <- createProcess cp _ <- forkIO $ do -- When the process is killed by an asynchronous exception: e <- waitForProcess ph traceIO $ "terminated iserv with: " ++ show e -- Grab the lock on the linker modifyMbPLS_ dvar $ \_ -> do _ <- takeMVar ivar -- Grab the lock on the iserv -- Deadlock??? GHC always seems to grab the lock on the PLS before the lock on the iserv -- Cleanup the iserv process cleanupProcess a -- Reset the iserv state and release the lock putMVar ivar Nothing -- Finally reset the dynlinker state and release the lock on the linker pure Nothing pure ph
This seems to work in my limited experimentation.
DynLinker type or the
MVar used to carry the state are not exported in ghc <= 8.8, so this workaround will not function there.
At the very least,
GHC should be doing all this cleanup by itself.
However, I'm a bit concerned† by the performance impact of having to restart the
ghc-iserv process and re-link all dependencies on potentially every key-stroke made by the user in their IDE.
Ideally, asynchronous exceptions should not be terminating the
ghc-iserv process at all. Instead, we should somehow tell it to abort processing the current message, clear any transient state and return to idle.
† - Of course, the obvious suggestion is to not use the external interpreter in ghcide - unfortunately using
-fexternal-interpreter was supposed to function as a workaround to #16525 , due to which unloading of static objects is not possible/advisable currently. This means that the first time we load an object file, that is going to be the code we are running for the entire session. To work around this we have to either use bytecode (which has bad memory/performance characteristics for us, and means that we can't persist compilation results for use in another session), or dynamically link in objects. For the latter, since we want to distribute static binaries, the only option is to use