Skip to content

Race/Async Exception issue in Network.Socket.connect

import Control.Concurrent
import Control.Monad
import Network.Socket
import Control.Exception as C
import System.Timeout
import Network.BSD(hostAddresses, getHostByName)
import System.IO.Error
import Data.Maybe

-- someHostName should be replaced by a real host that gives
-- "connection refused" errors on connection to ports in the range.
-- The latter ip is a junk one that should cause connections to
-- hang indefinitely.
-- More hostnames with either characteristic can be added to taste
-- if that helps to reproduce the bug.
servers = [ "someHostName", "126.255.255.255"]
ports = [9001..9099] :: [Int]

conns = [(h,p) | h <- servers, p <- ports]

connectSock :: Integral a => String -> a -> IO Socket
connectSock host port = do
   hn <- maybe (ioError . mkIOError doesNotExistErrorType "No Host Address" Nothing $ Just host) return . listToMaybe . hostAddresses =<< getHostByName host
   sk <- socket AF_INET Stream 6
   connect sk (SockAddrInet (fromIntegral port) hn) `C.onException` sClose sk
   return sk

pMapM f xs = mapM (\x -> forkIO $ f x) xs

mapM' f xs = mapM (\x -> (C.try :: IO a -> IO (Either C.SomeException a)) (f x)) xs

main = do
  -- This is the canary thread in the bugmine
  forkIO $ forever $ putStrLn "chirp" >> threadDelay 100000

  -- This is the bug thread
  forever $ pMapM (\(h,p) -> timeout 1000000 (connectSock h p) >> return ()) conns >> threadDelay 2000000

The above code, compiled with the threaded runtime, causes a race condition. After roughly one to two cycles of the bug thread, the canary thread stops running, indicating that the program has become somehow trashed. (The bug thread stops running as well).

In experiments, this race condition is best triggered with at least two servers, one of which yields "connection refused" on connection, and the other of which simply hangs -- the nonsense ip address provided above works for the latter.

if the pMapM is replaced by mapM' (i.e. we switch from parallel to serial connection) then the bug does not appear to be triggered.

wrapping the call to sClose in a mutex didn't seem to help, so it seems the race condition is in the connect call.

Trac metadata
Trac field Value
Version 6.10.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component libraries/network
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information