[RFC] Rectifying `ByteString`: FFI Data
This RFC is about the rectification of an untenable situation that has recently seen some areas for improvement: The ByteString Trilemma.
What is ByteString? Currently, it serves (poorly) the triple job of:
-
A blob of bytes: most convenient for network data that should live in un-pinned memory, as this avoids data fragmentation.
-
An FFI data blob: for data that should live in pinned memory, or the GC might decide to move it at an inconvenient time. However, this means that your ability to perform compaction is severely limited, which can lead to fragmentation on lots of small allocations.
-
An ASCII string with no verification whatsoever of its most "intuitive" usage vector, the IsString instance, leading to some surprising behaviours.
It is of my opinion, forged by many improbable and unfortunate encounters, that the "blob of bytes" role should be held by an unpinned ByteArray#
, and the FFI data part should be done through an FFIData type backed by a pinned MutableByteArray#
, and the ASCII literals should die in a ritual pyre to atone for our sins. be promptly disposed of.
With this context, I'd like to present the subject at hand:
FFIData, a type specially made for data exchange through FFI.
Several options exist for its implementation:
- A pinned
MutableByteArray#
, for which reading and writing is sound in safe and unsafe FFI import calls. This is for if the foreign code code merely needs to borrow the allocated memory, and isn't responsible for allocating or deallocating it.
The foundation for making this possible is UnliftedFFITypes
, which I encourage you to read.
- A
ForeignPtr
that allows for foreign-managed memory. This is for if the foreign code wants to own the allocated memory and be responsible for its allocation/deallocation.
I would like to gather some opinions or suggestions regarding this proposal. I believe untangling the responsibilities of ByteString
will contribute to a healthier ecosystem where boundaries between two different domains (FFI data and blob of bytes) are respected.