|
|
# Binary I/O
|
|
# Binary I/O
|
|
|
|
|
|
|
|
|
|
|
|
|
Binary I/O is a critical need in Haskell for many real-world problems. Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of external libraries exist providing various forms of binary I/O.
|
|
|
|
|
|
|
|
|
|
|
It is widely regarded as a mistake that H'98 treats I/O as character-based. The proposal is that I/O should operate over octets (i.e. 8-bit binary values) instead. This would permit the layering of character-encoding conversions (see [Unicode](unicode)) over the top of the basic I/O mechanism.
|
|
|
|
|
|
|
|
Two forms of binary I/O are considered here:
|
|
|
|
|
|
|
|
|
|
- Word8 based extensions to System.IO, and
|
|
|
|
|
- Typeclass-based Binary I/O (referred to as Binary) for serialising arbitrary data types, layered over Word8 extensions
|
|
|
|
|
|
|
|
|
|
## Explanation
|
|
Pros:
|
|
|
|
|
|
|
|
- Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see [Unicode](unicode)).
|
|
|
|
|
- Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see [Unicode](unicode)) may be layered.
|
|
|
|
|
- Type-classed binary I/O is needed to support serialisable structures and peristence for arbitrary Haskell data
|
|
|
|
|
|
|
|
|
|
## Proposal 1 - System.IO
|
|
- cleans up an area of confusion
|
|
|
|
- backwards compatible with all implementations
|
|
|
|
|
|
|
|
- One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [ System.IO](http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html) for a rough design.
|
|
|
|
|
|
|
|
|
|
## Proposal 2 - The Binary class
|
|
Cons:
|
|
|
|
|
|
|
|
- Proposal two is to add a Binary class, based on the type class described in [ The Bits Between The Lambdas](ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html). The advantage of this form of binary I/O over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each component of the type. Instances of I/O may be written by hand, or derived mechanically with [ DrIFT](http://repetae.net/john/computer/haskell/DrIFT/). Ideally Binary would be derivable by the compiler (as is done currently in nhc98), or perhaps using some form of datatype-generic declarations (see [DerivedInstances](derived-instances)).
|
|
|
|
|
|
|
|
|
|
## References
|
|
|
|
|
|
|
|
|
|
### Proposal 1
|
|
|
|
|
|
|
|
|
|
- The simplest implementation option is [ System.IO](http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html), which provides hGetBuf-style I/O. More sophisticated systems can be layered on top, as external libraries.
|
|
|
|
|
- A related library for [ Packed strings](http://www.cse.unsw.edu.au/~dons/fps.html), layered over System.IO, are a related interface, and sometimes used for binary I/O of flat data types.
|
|
|
|
|
|
|
|
|
|
### Proposal 2
|
|
|
|
|
|
|
|
|
|
- The Binary class is the de-facto standard for more structured data. The origins are [ described here](ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html). Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include:
|
|
|
|
|
|
|
|
|
|
- [ NHC's binary](http://haskell.org/nhc98/libs/Binary.html), the original
|
|
|
|
|
- [ GHC's Binary](http://cvs.haskell.org/cgi-bin/cvsweb.cgi/~checkout~/fptools/ghc/compiler/utils/Binary.hs), used internally by GHC.
|
|
|
|
|
- [ NewBinary](http://www.n-heptane.com/nhlab/repos/NewBinary/), the standard version today
|
|
|
|
|
- [ Lambdabot/Hmp3's Binary](http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs), a stripped-down Handle-only version of Binary.
|
|
|
|
|
- [ SerTH](http://www.cs.helsinki.fi/u/ekarttun/SerTH/) is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures
|
|
|
|
|
- [ ByteStream](http://freearc.narod.ru/), a new high-performance serialisation library, using gzip compression.
|
|
|
|
|
- A [ new](http://article.gmane.org/gmane.comp.lang.haskell.cafe/10803) optimised binary library
|
|
|
|
|
|
|
|
|
|
## Tickets
|
|
|
|
|
|
|
|
|
|
<table><tr><th>[\#15](https://gitlab.haskell.org//haskell/prime/issues/15)</th>
|
|
|
|
|
<td>add a binary IO interface</td></tr>
|
|
|
|
|
<tr><th>[\#16](https://gitlab.haskell.org//haskell/prime/issues/16)</th>
|
|
|
|
|
<td>Create unicode proposal</td></tr></table>
|
|
|
|
|
|
|
|
|
|
## Todo
|
|
|
|
|
|
|
|
|
|
- Clarify relationship between Haskell and C strings (or is that for the FFI?)
|
|
|
|
|
- More clarification of what is meant by \`Binary I/O' (i.e. packed strings, packed data, serialisation, ..)
|
|
|
|
|
|
|
|
|
|
## Pros/Cons? : System.IO
|
|
|
|
|
|
|
|
|
|
### Pros
|
|
|
|
|
|
|
|
|
|
- System.IO extensions are already in common use, simple to implement
|
|
|
|
|
- More sophisticated binary I/O may be layered on top
|
|
|
|
|
|
|
|
|
|
### Cons
|
|
|
|
|
|
|
|
|
|
- Possible that the API is not rich enough for many binary I/O requirements, we should strive for more?
|
|
|
|
|
|
|
|
|
|
## Pros/Cons? : Binary
|
|
|
|
|
|
|
|
|
|
### Pros
|
|
|
|
|
|
|
|
|
|
- The Binary class (particularly as implemented in NewBinary?) is simple to implement and widely used.
|
|
|
|
|
- Binary IO is a critical feature, lack of which is sometimes considered a flaw in Haskell98.
|
|
|
|
|
- Difficult to serialise data without this class
|
|
|
|
|
|
|
|
|
|
### Cons
|
|
|
|
|
|
|
|
|
|
- Doesn't support cyclic structures
|
|
|
|
|
- Lack of derivability can be annoying |
|
|