New backend for event manager based on io_uring
Since Linux kernel v5.1, there is a new type of interface for asynchronous system calls called
io_uring. The detailed specification lives at https://kernel.dk/io_uring.pdf, but as a summary: it reduces context switch overhead by reserving ring buffers for syscall requests and responses that are shared between the kernel and userspace. Checking if there are any new responses can therefore occur just by comparing pointers and without any context switches at all. Currently it supports only a limited amount of syscalls, but the creator of the
io_uring interface has expressed an intention to keep expanding the interface and provide asynchronous versions of as many system calls as possible. At the moment there is already sufficient functionality available for an event manager backend.
For applications doing a sufficient amount of I/O, there will "always" be file descriptors ready for reading or writing. In such a case, a backend based on
io_uring could save a call to
epoll_wait (and therefore a context switch) every time the event manager goes through its main loop. For applications that are not I/O bound, the event manager will make two nonblocking polling calls and finally a blocking call if there is still no fd available. For such an application, a backend based on
io_uring would make only a single system call versus three for the
epoll backend. For both the
io_uring interfaces, submitting a new file descriptor polling request takes a single system call. The
epoll backend does have the advantage in the case where "multishot" polling is used, since all
io_uring polling is singleshot and it needs to re-arm the pollig each time. Since the primary interface to the event manager is through
threadWaitRead/Write and these prefer exclusively oneshot semantics is available, this is not a big problem in practice. The only multishot polling operations I am aware of are used in the
GHC.Event.Control module of the event manager itself (there are only two of them and they only fire during program shutdown). Therefore, a backend based on
io_uring should always incur less system calls and context switches than the
epoll backend in real applications.
A secondary motivation is that currently the event manager is only used for polling of the status of file descriptors (and only non-file file descriptors such as sockets). There are many more system calls that can block though, such as
unlink(). These are often left as "unsafe" foreign calls due to concerns about spawning many OS threads. Implementing
io_uring bindings in
base would be a first step towards truly asynchronous versions of these system calls.
The feature consists of the following parts:
- Porting over of the io_uring bindings by @bgamari from here to
base. This is needed because the event manager lives in
GHC.Event, which is also in base and can't depend on anything else.
- Create a new event manager backend that uses the bindings to implement the interface for an event manager backend.
- Add a flag
HAVE_KQUEUEthat will enable or disable the loading of the implementation.
- Implement a compiler flag
--use-io-uringthat will switch on the new event manager backend. In a later release (after any kinks have been worked out) we could decide to make it the default if available.
At the time of writing I have a working implementation of points 1 and 2, though the code needs a lot of cleanup. One of the main unresolved issues is with performance testing; it is not trivial to determine whether an event manager backend is "better" and if any changes detected are due to the code changes or due to the test setup.