forkProcess leaks file descriptors
This is normal behavior as forking a process in POSIX will copy all file descriptors unless they are marked O_CLOEXEC. But in Haskell it's quite difficult to figure out which FDs need to be manually closed.
For example, if a
Handle to a file is opened in the parent process and isn't referenced in the code passed to
forkProcess, its FD will leak. In order to safely fork, a user has to know about all
Handles and other structures that use file descriptors currently active in the program as well as which ones will survive by being referenced in the child process.
A simpler problem is wanting to close most FDs (e.g. perhaps excepting std*) when forking. When you don't know where the file descriptors in the current process are coming from but you want them to be closed, a not uncommon approach is to iterate over all file descriptors and close them all. The
process library does this. This doesn't work for
forkProcess if a Haskell program is built against the threaded runtime because the IO event manager holds on to file descriptors it uses for control. Attempting to iterate over all FDs carelessly causes the IO manager to die when
-threaded is used. As far as I understand, all of these FDs are held by the
Control structure associated with an
EventManager: https://hackage.haskell.org/package/base-184.108.40.206/docs/src/GHC.Event.Control.html#Control .
base library does not expose these modules so there is no way to figure out what they are from user code.
In one's own application, these issues are tricky but ultimately surmountable as one in principle has the ability to track down every file descriptor being opened. However, when using
forkProcess in a library, one might need a sledgehammer. For example, in the
hdaemonize package it is noted that the library can leak file descriptors as there is no way to deal with this issue: https://hackage.haskell.org/package/hdaemonize-0.5.4/docs/System-Posix-Daemonize.html#v:daemonize
I am writing a library in the same design space as
hdaemonize that I would like to be able to sensibly handle file descriptors. In general the problem looks intractable (for example because arbitrary C libraries could initialize their own internal FDs), but if I could know which file descriptors are being used by the IO Manager, then I could at least provide for the use case where no FDs should be shared between parent and child.
Would it be sensible to expose more of the guts of the IO Manager in
base? Are there other parts of the RTS that use file descriptors that need to be preserved?