Data/Text/Encoding.hs · 1c65333713523fcd5c0f4c60a3c5d5b1134302e4 · Glasgow Haskell Compiler / Packages / text

Improve small string performance for UTF-8 encoding to bytestrings · 1c653337

Simon Meier authored Jan 07, 2014

On a 5 byte string the conversion of strict text to a strict bytestring is
still a factor 2x slower than the custom 'encodeUtf8_1' routine. However,
this is much better than the factor 4.5x that we started with.

I attribute the slowdown to the more expensive startup cost for the
bytestring-builder-based solution. Note that this startup cost is shared in
case a small string is encoded as part of a larger document, e.g., a JSON
document. I am thus not sure how relevant the small string performance for
converting to individual strict 'ByteString's is.

Note that the ASCII performance of the Builder-based UTF-8 encoder is 1.6x
faster than 'encodeUtf8_1'. The japanese and russion performance is about the
same.

Note also that the Builder-based strict text UTF-8 encoder has the benefit
that it won't waste any memory. In contrast, the 'encodeUtf8_1' function can
allocate as much as 4 times more memory than needed, as it does not trim the
resulting bytestring.

1c653337