killThread and getChanContents appear to interact strangely
I'm not sure if this an error in my program or in GHC, but I think the behaviour I'm seeing is strange enough to merit a place on the tracker.
I'm particularly confused by the fact that my correctGetChanContents appears to be able to return an empty list!
To run the test case, compile the file below and run it. It /should/ fail to terminate, with a lot of output like this:
Here we go
In the thread
Got an element
CHUNK CHUNK CHUNK: length = 1
Here we go
In the thread
Got an element
CHUNK CHUNK CHUNK: length = 1
Here we go
In the thread
Got an element
Sometimes this doesn't happen (it's nondeterministic). If it completes succesfully then try it again.
module Main where
import System.IO.Unsafe
import Control.Exception
import Control.Concurrent
import Data.Maybe
import Prelude hiding (catch)
{-
THE PROBLEM
===========
This Haskell program nondeterministically fails to terminate.
EXPLANATION
===========
This example is extracted from a larger program and hence has a number
of interacting parts that conspire to create the bug. They are:
1) Timed-out evaluation. The timeoutList function takes a lazy list computation
and a number of microseconds and returns as much of the list as could be
evaluated in that number of microseconds.
2) "Improving" IO. To use this you write a sort of IO action that is able to
call yieldImprovement at any time. When this is "run" with runImprovingIO
you get back an action that can be run to actually do the IO computation you
specified in the first place and a lazy list which represents the sequence
of "improvements" that will be output by that computation at some future
point in time.
In this example, the computation part of the Improving IO is spun off to
be evaluated on another thread, and the lazy list is consumed in chunks of
as many items as can be read in 10ms by readWithTimeout.
The test makes an Improving IO action that yields "improvements" that are
just a sequence of numbers, and makes sure that they all come back via
the lazy list of improvements when it is read in this chunky, timed-out
manner.
TENTATIVE CAUSE
===============
The getChanContents function seems to not be exception-safe. If the timed-out
evaluation kills its thread before it runs to completion, the channel seems to be
left in an undefined state, causing it to stop yielding values and hanging the
program.
I have demonstrated a version, correctGetChanContents, that tries to fix this problem
by blocking asynchronous exceptions during a readChan call. However, although the
program is indeed more reliable, the problem still sporadically occurs! Argh!
STRANGE THINGS
==============
If you change the definition of correctGetChanContents to include an initial call
to block and the program subsequently enters the buggy state, we get the error:
> Main: reifyList: list finished before a final value arrived
How is that possible?? This even occurs if the second call to block is removed.
-}
main :: IO ()
main = do
{-
This doesn't terminate, which proves that block and killThread work:
tid <- block $ forkIO $ let loop x = loop (x + 1) in loop 0
killThread tid
-}
-- I found that the test almost always worked when the input list
-- went up to 5000, failed 1/3 of the time when it went up to 20000,
-- and always failed (didn't terminate) with a list of length 100000.
let input = [1..5000] :: [Int]
-- Kick off improving IO and actually run the IO stuff in another thread
(improving, action) <- runImprovingIO (mapM_ yieldImprovement input)
forkIO action
-- We now want to get all of the items from the input that we sent through
-- the improving IO channel by reading from the channel in time chunks 10ms
-- in length.
-- See comments in readWithTimeout to find out about this -1
output <- readWithTimeout (length input - 1) improving
--print output
--print (length output)
-- Assert that the output matches
if input == output
then putStrLn "OK"
else putStrLn "Nope"
where
readWithTimeout n improving
| n <= 0 = return []
| otherwise = do
-- For the sake of argument use a 10ms time step for the timeout
chunk <- timeoutList 10000 (allListHeads improving)
putStrLn $ "CHUNK CHUNK CHUNK: length = " ++ show (length chunk)
-- Now we have a chunk, retrieve the next chunk
rest <- case listToMaybeLast chunk of
Nothing -> do
-- This case only appears if the timeout period was insufficient to return even a single
-- item from the improving value. It never occurs in practice.
readWithTimeout n improving
Just last_improving -> do
-- This is the normal case. We got at least one item, so we should continue reading the
-- improving value in chunks. However, we need to bear in mind that the first item in
-- the chunk we are working with will actually be the one we started with, so we shouldn't
-- include it when working out how many additional items we need to obtain, hence the +1.
readWithTimeout (n + 1 - length chunk) last_improving
-- We need to drop the first improving value returned from the recursive call, as it will just be
-- the one we gave the recursive call initially, hence the drop 1.
return $ (mapMaybe maybeHead chunk) ++ drop 1 rest
allListHeads :: [a] -> [[a]]
allListHeads list@[] = [list]
allListHeads list@(_:rest) = list : allListHeads rest
listToMaybeLast :: [a] -> Maybe a
listToMaybeLast = listToMaybe . reverse
maybeHead :: [a] -> Maybe a
maybeHead (x:_) = Just x
maybeHead [] = Nothing
--
-- Evaluation with timeout
--
-- | Evaluates the given list for the given number of microseconds. After the time limit
-- has been reached, a list is returned consisting of the prefix of the list that was
-- successfully evaluated within the time limit.
--
-- This function does /not/ evaluate the elements of the list: it just ensures that the
-- list spine arrives in good order.
timeoutList :: Int -> [a] -> IO [a]
timeoutList timeout improving = do
-- Create var that will be used to store the known prefix (in reverse order)
putStrLn "Here we go"
known_prefix_var <- newMVar []
-- Go off and get as much of that prefix as we can
thread_id <- forkIO (putStrLn "In the thread" >> go known_prefix_var improving)
-- Wait for it to do its thing, then kill the thread
threadDelay timeout
killThread thread_id
-- Return that prefix
mb_known_prefix <- tryTakeMVar known_prefix_var
case mb_known_prefix of
Nothing -> error "timeoutList: bug in threading logic!"
Just known_prefix -> return (reverse known_prefix)
where
go _ [] = putStrLn "Bottom" >> return ()
go var (x:xs) = do
putStrLn "Got an element"
modifyMVar_ var (\current_prefix -> return (x : current_prefix))
go var xs
--
-- The ImprovingIO monad
--
newtype ImprovingIO i a = IIO { unIIO :: Chan (Maybe i) -> IO a }
instance Monad (ImprovingIO i) where
return x = IIO (const $ return x)
ma >>= f = IIO $ \chan -> do
a <- unIIO ma chan
unIIO (f a) chan
yieldImprovement :: i -> ImprovingIO i ()
yieldImprovement improvement = IIO $ \chan -> writeChan chan (Just improvement)
runImprovingIO :: ImprovingIO i () -> IO ([i], IO ())
runImprovingIO iio = do
chan <- newChan
let action = do
unIIO iio chan
putStrLn "SIGNALLING LIST END - everything is available!"
writeChan chan Nothing -- @Nothing@ signals the end of the list
yielded_improvements <- correctGetChanContents chan
return (reifyList yielded_improvements, action)
correctGetChanContents :: Chan a -> IO [a]
correctGetChanContents ch
= block $ unsafeInterleaveIO (block $ do
x <- readChan ch
xs <- correctGetChanContents ch
return (x:xs)
)
liftIO :: IO a -> ImprovingIO i a
liftIO io = IIO $ const io
reifyList :: [Maybe i] -> [i]
reifyList (Just x:rest) = x : reifyList rest
reifyList (Nothing:_) = []
reifyList [] = error "reifyList: list finished before a final value arrived"
Trac metadata
Trac field | Value |
---|---|
Version | 6.9 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Runtime System |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | MacOS X |
Architecture | x86 |