New backend for event manager based on io_uring
Motivation
Since Linux kernel v5.1, there is a new type of interface for asynchronous system calls called io_uring
. The detailed specification lives at https://kernel.dk/io_uring.pdf, but as a summary: it reduces context switch overhead by reserving ring buffers for syscall requests and responses that are shared between the kernel and userspace. Checking if there are any new responses can therefore occur just by comparing pointers and without any context switches at all. Currently it supports only a limited amount of syscalls, but the creator of the io_uring
interface has expressed an intention to keep expanding the interface and provide asynchronous versions of as many system calls as possible. At the moment there is already sufficient functionality available for an event manager backend.
For applications doing a sufficient amount of I/O, there will "always" be file descriptors ready for reading or writing. In such a case, a backend based on io_uring
could save a call to epoll_wait
(and therefore a context switch) every time the event manager goes through its main loop. For applications that are not I/O bound, the event manager will make two nonblocking polling calls and finally a blocking call if there is still no fd available. For such an application, a backend based on io_uring
would make only a single system call versus three for the epoll
backend. For both the epoll
and io_uring
interfaces, submitting a new file descriptor polling request takes a single system call. The epoll
backend does have the advantage in the case where "multishot" polling is used, since all io_uring
polling is singleshot and it needs to re-arm the pollig each time. Since the primary interface to the event manager is through threadWaitRead/Write
and these prefer exclusively oneshot semantics is available, this is not a big problem in practice. The only multishot polling operations I am aware of are used in the GHC.Event.Control
module of the event manager itself (there are only two of them and they only fire during program shutdown). Therefore, a backend based on io_uring
should always incur less system calls and context switches than the epoll
backend in real applications.
A secondary motivation is that currently the event manager is only used for polling of the status of file descriptors (and only non-file file descriptors such as sockets). There are many more system calls that can block though, such as stat()
and unlink()
. These are often left as "unsafe" foreign calls due to concerns about spawning many OS threads. Implementing io_uring
bindings in base
would be a first step towards truly asynchronous versions of these system calls.
Proposal
The feature consists of the following parts:
- Porting over of the io_uring bindings by @bgamari from here to
base
. This is needed because the event manager lives inGHC.Event
, which is also in base and can't depend on anything else. - Create a new event manager backend that uses the bindings to implement the interface for an event manager backend.
- Add a flag
HAVE_IO_URING
similar toHAVE_EPOLL
andHAVE_KQUEUE
that will enable or disable the loading of the implementation. - Implement a compiler flag
--use-io-uring
that will switch on the new event manager backend. In a later release (after any kinks have been worked out) we could decide to make it the default if available.
At the time of writing I have a working implementation of points 1 and 2, though the code needs a lot of cleanup. One of the main unresolved issues is with performance testing; it is not trivial to determine whether an event manager backend is "better" and if any changes detected are due to the code changes or due to the test setup.