This project is mirrored from https://github.com/haskell/text.
Pull mirroring updated .
- 07 Jan, 2014 15 commits
-
-
bos authored
The value that was having too general a type inferred is now a pointer, so inference doesn't accidentally overgeneralize.
-
bos authored
This helps performance quite a bit! Now encoding Japanese text is 2x faster than encodeUtf8, as opposed to 30% faster before. Not bad!
-
bos authored
-
bos authored
--HG-- extra : amend_source : badbc206c5b2b8b827be0e42811becfa35b0c000
-
bos authored
-
bos authored
This requires a bit more torturing to maintain performance. For some unknown reason, doing the same refactoring on go4 decreases performance on russian-small.txt by half!
-
bos authored
-
bos authored
-
bos authored
The goal here is to avoid a buffer size check on every iteration, instead only doing one the first time we encounter some input that's larger than the buffer we preallocated. This helps performance rather a lot: we don't regress on the smallest inputs, but we are up to 35% faster than the previous version of encodeUtf8 on larger inputs.
-
bos authored
-
bos authored
-
bos authored
-
bos authored
-
bos authored
These require at least the following version of the text-test-data, dated January 6: git: 2183e3e5423fbf0d9d0187a4455df699c5e04b74 hg: 6c0e2b527bbbc6e18c622e452d16634b5d953b34
-
bos authored
Polish UTF-8 bytestring builder support
-
- 03 Jan, 2014 3 commits
-
-
Simon Meier authored
The counter-example for the existing code is a string of length '2*n' that starts with 'n' characters with codepoints in the range (0x7F, 0x7FF) and ends with 'n' ASCII characters. All 'n' ASCII characters will be written after the end of the output buffer.
-
Simon Meier authored
- adjust function names to 'encodeUtf8Builder' and 'encodeUtf8BuilderEscaped' - expose the same conversion to builders for both lazy and strict text - ensure 'Escaped' versions are inlined to allow specialization for specific escaping primitives - fix some Haddock references - add Haddock comment about bytestring >= 0.10.4.0 dependency - remove stream-to-builder encoding functions. There is no direct use case for them and they require too much knowledge about the internals to be used correctly.
-
bos authored
I somehow forgot that we allocate the initial ByteString to contain the same number of bytes as the Text contains code units. This means that we never need to ensure that the ByteString is big enough, nor (with this observation) does a special-cased ASCII-only loop help performance.
-
- 02 Jan, 2014 1 commit
-
-
bos authored
-
- 31 Dec, 2013 2 commits
- 30 Dec, 2013 14 commits
-
-
bos authored
--HG-- rename : Data/Text/Encoding/Utf8.hs => Data/Text/Internal/Encoding/Utf8.hs
-
bos authored
-
bos authored
-
bos authored
-
bos authored
-
bos authored
As far as I know, this completes the set of possible invalid encodings. --HG-- extra : rebase_source : 67b2c2d04dd9aa582e4d8d5d0097be1525395c41 extra : amend_source : c88aefbe0c96c6a496c3a74bbf4d17a0a5da0e16
-
bos authored
-
bos authored
-
bos authored
-
bos authored
-
bos authored
-
bos authored
This test currently fails due to gh-61. --HG-- extra : amend_source : ca66a1e6503a0cb9cf6cf5f2b82f2199133a6512
-
bos authored
This version tries to force the real decoding function to be inlined into each of its callers, which in turn each have different criteria for backing up a byte. This avoids an extra test at the end of strict decoding. While this seems to fix gh-61, I want to beef up the test suite so that it will correctly detect the bug.
-
bos authored
The refactoring in that commit was performed incorrectly, such that it would no longer detect as invalid an incomplete series of continuation bytes at the end of a string.
-
- 08 Dec, 2013 1 commit
-
-
bos authored
-
- 05 Dec, 2013 4 commits