|
|
# Binary I/O
|
|
# Binary I/O
|
|
|
|
|
|
|
|
|
|
|
|
|
Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of competing external libraries exist providing various forms of binary I/O, providing forms of compressed I/O, and serialised, persistent data.
|
|
Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of external libraries exist providing various forms of binary I/O.
|
|
|
|
|
|
|
|
|
|
|
|
Two forms of binary I/O are considered here:
|
|
|
|
|
|
|
|
- Word8 based extensions to Syste.IO, and
|
|
|
|
- Typeclass-based Binary I/O (referred to as Binary) for serialising arbitrary data types, layered over Word8 extensions
|
|
|
|
|
|
|
|
## Explanation
|
|
|
|
|
|
|
|
- Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see [Unicode](unicode)).
|
|
- Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see [Unicode](unicode)).
|
|
|
- Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see [Unicode](unicode)) may be layered.
|
|
- Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see [Unicode](unicode)) may be layered.
|
|
|
|
- Type-classed binary I/O is needed to support serialisable structures and peristence for arbitrary Haskell data
|
|
|
|
|
|
|
|
|
## Proposal 1 - System.IO
|
|
|
|
|
|
|
|
One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [ System.IO](http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html) for a rough design.
|
|
- One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [ System.IO](http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html) for a rough design.
|
|
|
|
|
|
|
|
|
|
|
|
|
Another would be to look at one of the binary I/O libraries based on [ The Bits Between The Lambdas](ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html), descendents of which have proliferated in the last couple of years. The advantage of this style over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each type component of the data you with to serialise. Instances of I/O may be written by hand, or derived mechanically with [ DrIFT](http://repetae.net/john/computer/haskell/DrIFT/).
|
|
## Proposal 2 - The Binary class
|
|
|
|
|
|
|
|
|
- Proposal two is to add a Binary class, based on the type class described in [ The Bits Between The Lambdas](ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html). The advantage of this form of binary I/O over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each component of the type. Instances of I/O may be written by hand, or derived mechanically with [ DrIFT](http://repetae.net/john/computer/haskell/DrIFT/). Ideally Binary would be derivable by the compiler (is this feasible?).
|
|
|
|
|
|
|
|
Issues to consider:
|
|
## References
|
|
|
|
|
|
|
|
- What language extensions are required?
|
|
### Proposal 1
|
|
|
- Support for cyclic structures
|
|
|
|
|
- Is it possible to derive I/O instances for types, or must they be written by hand?
|
|
|
|
|
|
|
|
|
|
|
- The simplest implementation option is [ System.IO](http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html), which provides hGetBuf-style I/O. More sophisticated systems can be layered on top, as external libraries.
|
|
|
|
- [ Packed strings](http://www.cse.unsw.edu.au/~dons/fps.html), layered over System.IO, are a related interface, and sometimes used for binary I/O of flat data types.
|
|
|
|
|
|
|
|
Existing libraries for Binary I/O:
|
|
### Proposal 2
|
|
|
|
|
|
|
|
- The simplest is probably [ System.IO](http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html), which provides hGetBuf-style I/O. Really only suitable for arrays.
|
|
- The Binary class is the de-facto standard for more structured data. The origins are [ described here](ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html). Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include:
|
|
|
- [ Packed strings](http://www.cse.unsw.edu.au/~dons/fps.html), layered over System.IO is sometimes used, for simple data types, which can be easily converted to and from flat arrays, using list functions.
|
|
|
|
|
- The de-facto standard, and also the fastest, for non-trivial data types, the Binary class, a version of which is [ described here](ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html). Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include:
|
|
|
|
|
|
|
|
|
|
- [ NHC's binary](http://haskell.org/nhc98/libs/Binary.html), the original
|
|
- [ NHC's binary](http://haskell.org/nhc98/libs/Binary.html), the original
|
|
|
- [ GHC's Binary](http://cvs.haskell.org/cgi-bin/cvsweb.cgi/~checkout~/fptools/ghc/compiler/utils/Binary.hs), used internally by GHC.
|
|
- [ GHC's Binary](http://cvs.haskell.org/cgi-bin/cvsweb.cgi/~checkout~/fptools/ghc/compiler/utils/Binary.hs), used internally by GHC.
|
|
|
- [ NewBinary](http://www.n-heptane.com/nhlab/repos/NewBinary/), the standard
|
|
- [ NewBinary](http://www.n-heptane.com/nhlab/repos/NewBinary/), the standard version today
|
|
|
- [ Lambdabot/Hmp3's Binary](http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs), a faster, Handle-only version of Binary.
|
|
- [ Lambdabot/Hmp3's Binary](http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs), a stripped-down Handle-only version of Binary.
|
|
|
- [ SerTH](http://www.cs.helsinki.fi/u/ekarttun/SerTH/) is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures
|
|
- [ SerTH](http://www.cs.helsinki.fi/u/ekarttun/SerTH/) is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures
|
|
|
- [ ByteStream](http://freearc.narod.ru/), a new high-performance serialisation library, using gzip compression.
|
|
- [ ByteStream](http://freearc.narod.ru/), a new high-performance serialisation library, using gzip compression.
|
|
|
|
|
|
|
|
|
## Pros/Cons? : System.IO
|
|
|
|
|
|
|
|
Further information:
|
|
### Pros
|
|
|
|
|
|
|
|
- [ A recent mailing list thread](http://www.haskell.org/pipermail/haskell/2005-December/017029.html).
|
|
|
|
|
- [ A page on the Haskell wiki](http://haskell.org/hawiki/BinaryIo)
|
|
|
|
|
|
|
|
|
|
|
- System.IO extensions are already in common use, simple to implement
|
|
|
|
- More sophisticated binary I/O may be layered on top
|
|
|
|
|
|
|
|
The two simplest options are to go with only the System.IO extension, or the Binary class.
|
|
### Cons
|
|
|
|
|
|
|
|
|
- Possible that the API is not rich enough for many binary I/O requirements, we should strive for more?
|
|
|
|
|
|
|
|
Pros:
|
|
## Pros/Cons? : Binary
|
|
|
|
|
|
|
|
- The Binary class (particularly as implemented in NewBinary?) is simple, elegant and widely used.
|
|
### Pros
|
|
|
- Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98, so we should do something about it.
|
|
|
|
|
|
|
|
|
|
|
- The Binary class (particularly as implemented in NewBinary?) is simple to implement and widely used.
|
|
|
|
- Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98.
|
|
|
|
- Difficult to serialise data without this class
|
|
|
|
|
|
|
|
Cons:
|
|
### Cons
|
|
|
|
|
|
|
|
- Ideally(?) Binary should be derivable without an external tool
|
|
- There is an overlap with the Storable class that isn't exploited
|
|
|
- Binary only supports I/O from Handles and memory buffers. Some people require other kinds of streams
|
|
- Doesn't support cyclic structures
|
|
|
- There is an overlap with Storable that isn't exploited or explained in any existing library.
|
|
- Lack of derivability can be annoying |
|
|
- Some new developments are underway to combine SerTH's cyclic structure support with the speed of NewBinary?
|
|
|
|
|
- What about a NewIO library, how will this overlap/interact? |
|
|