Binary I/O
Binary I/O is a critical need in Haskell for many real-world problems. Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of external libraries exist providing various forms of binary I/O.
Two forms of binary I/O are considered here:
- Word8 based extensions to System.IO, and
- Typeclass-based Binary I/O (referred to as Binary) for serialising arbitrary data types, layered over Word8 extensions
Explanation
- Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see Unicode).
- Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see Unicode) may be layered.
- Type-classed binary I/O is needed to support serialisable structures and peristence for arbitrary Haskell data
Proposal 1 - System.IO
- One proposal is to add a form of I/O over
Word8
(i.e. octets, 8-bit binary values). See the "Binary input and output" section of System.IO for a rough design.
Proposal 2 - The Binary class
- Proposal two is to add a Binary class, based on the type class described in The Bits Between The Lambdas. The advantage of this form of binary I/O over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each component of the type. Instances of I/O may be written by hand, or derived mechanically with DrIFT. Ideally Binary would be derivable by the compiler (as is done currently in nhc98), or perhaps using some form of datatype-generic declarations (see DerivedInstances).
References
Proposal 1
- The simplest implementation option is System.IO, which provides hGetBuf-style I/O. More sophisticated systems can be layered on top, as external libraries.
- A related library for Packed strings, layered over System.IO, are a related interface, and sometimes used for binary I/O of flat data types.
Proposal 2
-
The Binary class is the de-facto standard for more structured data. The origins are described here. Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include:
- NHC's binary, the original
- GHC's Binary, used internally by GHC.
- NewBinary, the standard version today
- Lambdabot/Hmp3's Binary, a stripped-down Handle-only version of Binary.
-
SerTH is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures
-
ByteStream, a new high-performance serialisation library, using gzip compression.
-
A new optimised binary library
Tickets
#15 | add a binary IO interface |
---|---|
#16 | Create unicode proposal |
Todo
- Clarify relationship between Haskell and C strings (or is that for the FFI?)
- More clarification of what is meant by `Binary I/O' (i.e. packed strings, packed data, serialisation, ..)
Pros/Cons? : System.IO
Pros
- System.IO extensions are already in common use, simple to implement
- More sophisticated binary I/O may be layered on top
Cons
- Possible that the API is not rich enough for many binary I/O requirements, we should strive for more?
Pros/Cons? : Binary
Pros
- The Binary class (particularly as implemented in NewBinary?) is simple to implement and widely used.
- Binary IO is a critical feature, lack of which is sometimes considered a flaw in Haskell98.
- Difficult to serialise data without this class
Cons
- Doesn't support cyclic structures
- Lack of derivability can be annoying