Use unicode-data (or its implementation) for Data.Char
The unicode-data
library reimplements most of Data.Char
in a better way. With better, I mean
- reportedly up to 5× faster
- only pure Haskell code, no FFI
Both seem to be obviously desirable even for base. Faster is better, and pure Haskell makes things like targeting JS or webassembly easier, I assume. Also, it’s more reputable for us :-)
As with our current implementation there is a tool that generates some code based on the Unicode standard.
It would be great to benefit from Adithya Kumar's work here, and use his implementation in base
.
Unfortunately, it does not seem to be easily possible to for base
to simply depend on unicode-data
, as that uses too much stuff that’s in base
(and not in, say, ghc-prim
).
So the question is: Should we
- copy that code (including generator) into base, use it to provide
Data.Char
, and maintain it in parallel with the stand-alone upstream library, or - copy that code (including generator) into base, use it to provide
Data.Char
and the more specialized modules provided byunicode-standard
that are of interest to users with more precise Unicode needs (e.g.Unicode.Char.General
), so that that package (if that’s in the interest of its maintainers) can be deprecated and everyone will find it all inbase
.
(This discourse thread may be relevant.)