Skip to content
  • Simon Meier's avatar
    Improve small string performance for UTF-8 encoding to bytestrings · 1c653337
    Simon Meier authored
    On a 5 byte string the conversion of strict text to a strict bytestring is
    still a factor 2x slower than the custom 'encodeUtf8_1' routine. However,
    this is much better than the factor 4.5x that we started with.
    
    I attribute the slowdown to the more expensive startup cost for the
    bytestring-builder-based solution. Note that this startup cost is shared in
    case a small string is encoded as part of a larger document, e.g., a JSON
    document. I am thus not sure how relevant the small string performance for
    converting to individual strict 'ByteString's is.
    
    Note that the ASCII performance of the Builder-based UTF-8 encoder is 1.6x
    faster than 'encodeUtf8_1'. The japanese and russion performance is about the
    same.
    
    Note also that the Builder-based strict text UTF-8 encoder has the benefit
    that it won't waste any memory. In contrast, the 'encodeUtf8_1' function can
    allocate as much as 4 times more memory than needed, as it does not trim the
    resulting bytestring.
    1c653337