Race/Async Exception issue in Network.Socket.connect
import Control.Concurrent
import Control.Monad
import Network.Socket
import Control.Exception as C
import System.Timeout
import Network.BSD(hostAddresses, getHostByName)
import System.IO.Error
import Data.Maybe
-- someHostName should be replaced by a real host that gives
-- "connection refused" errors on connection to ports in the range.
-- The latter ip is a junk one that should cause connections to
-- hang indefinitely.
-- More hostnames with either characteristic can be added to taste
-- if that helps to reproduce the bug.
servers = [ "someHostName", "126.255.255.255"]
ports = [9001..9099] :: [Int]
conns = [(h,p) | h <- servers, p <- ports]
connectSock :: Integral a => String -> a -> IO Socket
connectSock host port = do
hn <- maybe (ioError . mkIOError doesNotExistErrorType "No Host Address" Nothing $ Just host) return . listToMaybe . hostAddresses =<< getHostByName host
sk <- socket AF_INET Stream 6
connect sk (SockAddrInet (fromIntegral port) hn) `C.onException` sClose sk
return sk
pMapM f xs = mapM (\x -> forkIO $ f x) xs
mapM' f xs = mapM (\x -> (C.try :: IO a -> IO (Either C.SomeException a)) (f x)) xs
main = do
-- This is the canary thread in the bugmine
forkIO $ forever $ putStrLn "chirp" >> threadDelay 100000
-- This is the bug thread
forever $ pMapM (\(h,p) -> timeout 1000000 (connectSock h p) >> return ()) conns >> threadDelay 2000000
The above code, compiled with the threaded runtime, causes a race condition. After roughly one to two cycles of the bug thread, the canary thread stops running, indicating that the program has become somehow trashed. (The bug thread stops running as well).
In experiments, this race condition is best triggered with at least two servers, one of which yields "connection refused" on connection, and the other of which simply hangs -- the nonsense ip address provided above works for the latter.
if the pMapM is replaced by mapM' (i.e. we switch from parallel to serial connection) then the bug does not appear to be triggered.
wrapping the call to sClose in a mutex didn't seem to help, so it seems the race condition is in the connect call.
Trac metadata
Trac field | Value |
---|---|
Version | 6.10.3 |
Type | Bug |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | libraries/network |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |