HISTORY The timerfd facility was originally ported to FreeBSD's Linux compatibility layer by Dmitry Chagin <dchagin@FreeBSD.org> in FreeBSD 12.0. It was revised and adapted to be native by Jake Freeland <jfree@FreeBSD.org> in FreeBSD 14.0.
I have a feeling that this is not a FreeBSD specific bug because it is mentioned in #20132 and the manpage doesn't even have ENOSUPP error code.
I wonder if due to some bug the timerfd variable ends up containing some other fd. Maybe print its numerical value and then compare it with output of lsof or with value where timerfd is created.
The timerfd descriptor is read and closed in the same function rts/posix/ticker.c:itimer_thread_func(). I see no way how could it happen that the descriptor is read after closing. Except maybe if we have several itimer_thread_funcs running in different threads?
I'd be curious to know which close actually closed the timerfd. There is a chance it's not the one we would expect. @bgamari could you find it? Maybe with some breakpoint?
@bgamari I debugged this, and now have a small reproducer. Note that in the reproducer the close does not happen on any FD as shown in the gdb log below. cc @maerwald
module Main whereimport Control.Concurrentimport qualified System.Posix.Process as SPPmain = do print "before SPP.forkProcess" threadDelay (5*1000*1000) SPP.forkProcess $ pure () threadDelay (5*1000*1000) print "after SPP.forkProcess"
$ freebsd-version 14.0-RELEASE$ ghc --versionThe Glorious Glasgow Haskell Compilation System, version 9.6.6$ ghc app/Main.hs[1 of 2] Compiling Main ( app/Main.hs, app/Main.o ) [Control.Concurrent changed][2 of 2] Linking app/Main [Objects changed]$ app/Main"before SPP.forkProcess""after SPP.forkProcess"$ ghc app/Main.hs -threaded[2 of 2] Linking app/Main [Flags changed]$ app/Main"before SPP.forkProcess"Main: internal error: Ticker: read(timerfd) failed with Operation not supported and returned -1 (GHC version 9.6.6 for x86_64_portbld_freebsd) Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug"after SPP.forkProcess"
(gdb) b closeBreakpoint 1 at 0x404fb0(gdb) rStarting program: /home/divam/repos/dfordivam/various-codes/app/Main [New LWP 317623 of process 30417][New LWP 317624 of process 30417][New LWP 317625 of process 30417]"before SPP.forkProcess"[New LWP 317626 of process 30417][Detaching after fork from child process 30418]Main: internal error: Ticker: read(timerfd) failed with Operation not supported and returned -1 (GHC version 9.6.6 for x86_64_portbld_freebsd) Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug"after SPP.forkProcess"[Switching to LWP 317625 of process 30417]Thread 4 "ghc_worker" hit Breakpoint 1, 0x0000000800cde4b4 in close () from /lib/libc.so.7(gdb)
Thank you for the reproducer! I think I understand what's going on.
The timerfd descriptor is set to close-on-exec, so after forking the variable contains invalid value that is then gets occupied by another descriptor (the kqueue one) which does not support reading.
I guess the correct fix is to remove TFD_CLOEXEC, but I'm not sure if this is correct.
I don't think your analysis is correct. After the fork, in the child, we reinit the ticker (initTimer) so it should set the timerfd variable to a proper timerfd descriptor before the ticker thread is created again.
Note that we should probably close timerfd first in the child after the fork because currently we leak it as it remains valid after a fork (at least on Linux):
fork(2) semantics After a fork(2), the child inherits a copy of the file descriptor created by timerfd_create(). The file descriptor refers to the same underlying timer object as the corresponding file descriptor in the parent, and read(2)s in the child will return information about expirations of the timer.
@dfordivam Could you use strace to get a log of the syscalls? We would need to see the fd for the failing read and then scroll up to find the syscall that created it. It could give us a hint about what's going on.
The timerfd is indeed gets created again, but it is then closed. The consecutive kqueue call takes the same fd, because it is now free. I'm lost again.
It looks similar to what this comment describes: #4449 (comment 48169) I.e. the IOManager is cleaning up some FDs that are no longer relevant and that have been reused for something else (here for timerfd).
If it doesn't work, apparently the fd returned by the kqueue isn't shared with the child, so perhaps we shouldn't close it at all after a fork? (I.e. comment out the I.delete in GHC.Event.Manager.{cleanup,newWith} to try)