This project is mirrored from https://github.com/haskell/text.
Pull mirroring updated .
- 15 May, 2015 1 commit
-
-
Tom Ellis authored
-
- 08 Sep, 2014 1 commit
-
-
bos authored
This fixes gh-87.
-
- 10 Mar, 2014 1 commit
-
-
quchen authored
The pragma-based deprecation of 'decodeASCII' correctly claims that 'decodeUtf8' should be used (matching the implementation), while the documentation string of the function itself recommended 'decodeLatin1' (which does not).
-
- 07 Mar, 2014 1 commit
-
-
bos authored
Replace the old text smart constructor with the slightly smarter one we've had all along that ensures that it doesn't pin its array if it's empty.
-
- 06 Mar, 2014 2 commits
- 19 Feb, 2014 2 commits
- 10 Jan, 2014 1 commit
-
-
Leon P Smith authored
-
- 09 Jan, 2014 1 commit
-
-
bos authored
-
- 08 Jan, 2014 3 commits
- 07 Jan, 2014 20 commits
-
-
bos authored
Since encodeUtf8_2 wins under all circumstances, there's no reason to keep the intermediate version around.
-
bos authored
This has the odd side effect of improving tiny-string performance from 20% slower then encodeUtf8_1 to about 5% faster. Never stop being weird, GHC optimizer!
-
bos authored
Not surprisingly, this is a lot faster than encodeUtf8_1 and the Builder-based rewrite under almost all circumstances. It's slower on tiny inputs (20%), but roughly twice as fast as encodeUtf8_1 on longer inputs. --HG-- extra : amend_source : 093410e1295572be039d87a9c21e97d250c5f9f9
-
Simon Meier authored
On a 5 byte string the conversion of strict text to a strict bytestring is still a factor 2x slower than the custom 'encodeUtf8_1' routine. However, this is much better than the factor 4.5x that we started with. I attribute the slowdown to the more expensive startup cost for the bytestring-builder-based solution. Note that this startup cost is shared in case a small string is encoded as part of a larger document, e.g., a JSON document. I am thus not sure how relevant the small string performance for converting to individual strict 'ByteString's is. Note that the ASCII performance of the Builder-based UTF-8 encoder is 1.6x faster than 'encodeUtf8_1'. The japanese and russion performance is about the same. Note also that the Builder-based strict text UTF-8 encoder has the benefit that it won't waste any memory. In contrast, the 'encodeUtf8_1' function can allocate as much as 4 times more memory than needed, as it does not trim the resulting bytestring.
-
bos authored
-
bos authored
-
bos authored
-
bos authored
The value that was having too general a type inferred is now a pointer, so inference doesn't accidentally overgeneralize.
-
bos authored
This helps performance quite a bit! Now encoding Japanese text is 2x faster than encodeUtf8, as opposed to 30% faster before. Not bad!
-
bos authored
-
bos authored
--HG-- extra : amend_source : badbc206c5b2b8b827be0e42811becfa35b0c000
-
bos authored
-
bos authored
This requires a bit more torturing to maintain performance. For some unknown reason, doing the same refactoring on go4 decreases performance on russian-small.txt by half!
-
bos authored
-
bos authored
-
bos authored
The goal here is to avoid a buffer size check on every iteration, instead only doing one the first time we encounter some input that's larger than the buffer we preallocated. This helps performance rather a lot: we don't regress on the smallest inputs, but we are up to 35% faster than the previous version of encodeUtf8 on larger inputs.
-
bos authored
-
bos authored
-
bos authored
-
bos authored
-
- 03 Jan, 2014 3 commits
-
-
Simon Meier authored
The counter-example for the existing code is a string of length '2*n' that starts with 'n' characters with codepoints in the range (0x7F, 0x7FF) and ends with 'n' ASCII characters. All 'n' ASCII characters will be written after the end of the output buffer.
-
Simon Meier authored
- adjust function names to 'encodeUtf8Builder' and 'encodeUtf8BuilderEscaped' - expose the same conversion to builders for both lazy and strict text - ensure 'Escaped' versions are inlined to allow specialization for specific escaping primitives - fix some Haddock references - add Haddock comment about bytestring >= 0.10.4.0 dependency - remove stream-to-builder encoding functions. There is no direct use case for them and they require too much knowledge about the internals to be used correctly.
-
bos authored
I somehow forgot that we allocate the initial ByteString to contain the same number of bytes as the Text contains code units. This means that we never need to ensure that the ByteString is big enough, nor (with this observation) does a special-cased ASCII-only loop help performance.
-
- 05 Dec, 2013 2 commits
-
-
bos authored
-
bos authored
--HG-- rename : Data/Text/Util.hs => Data/Text/Internal/Functions.hs rename : Data/Text/Lazy/Internal.hs => Data/Text/Internal/Lazy.hs rename : Data/Text/Private.hs => Data/Text/Internal/Private.hs rename : Data/Text/Search.hs => Data/Text/Internal/Search.hs rename : Data/Text/Unsafe/Base.hs => Data/Text/Internal/Unsafe.hs rename : Data/Text/UnsafeChar.hs => Data/Text/Internal/Unsafe/Char.hs rename : Data/Text/UnsafeShift.hs => Data/Text/Internal/Unsafe/Shift.hs
-
- 04 Dec, 2013 2 commits
-
-
bos authored
--HG-- rename : Data/Text/Fusion.hs => Data/Text/Internal/Fusion.hs rename : Data/Text/Fusion/CaseMapping.hs => Data/Text/Internal/Fusion/CaseMapping.hs rename : Data/Text/Fusion/Common.hs => Data/Text/Internal/Fusion/Common.hs rename : Data/Text/Fusion/Size.hs => Data/Text/Internal/Fusion/Size.hs rename : Data/Text/Fusion/Internal.hs => Data/Text/Internal/Fusion/Types.hs
-
bos authored
--HG-- rename : Data/Text/Encoding/Fusion.hs => Data/Text/Internal/Encoding/Fusion.hs rename : Data/Text/Encoding/Fusion/Common.hs => Data/Text/Internal/Encoding/Fusion/Common.hs rename : Data/Text/Encoding/Utf16.hs => Data/Text/Internal/Encoding/Utf16.hs rename : Data/Text/Encoding/Utf32.hs => Data/Text/Internal/Encoding/Utf32.hs rename : Data/Text/Encoding/Utf8.hs => Data/Text/Internal/Encoding/Utf8.hs
-