Support Unicode characters in instance Show String
This feature request does not ask to change the language, but change the behavior of base
.
Motivation
Unicode has been widely adopted and people around the world rely on Unicode to write in their native languages. Haskell, however, has been stuck in ASCII, and escape all non-ASCII characters in the String
's instance of the Show
class, despite the fact that each element of a String
is typically a Unicode code point, and putStrLn
actually works as expected. Consider the following examples:
ghci> print "Hello, 世界"
"Hello, \19990\30028"
ghci> print "Hello, мир"
"Hello, \1084\1080\1088"
ghci> print "Hello, κόσμος"
"Hello, \954\972\963\956\959\962"
ghci> "Hello, 世界" -- ghci calls `show`, so string literals are also escaped
"Hello, \19990\30028"
ghci> "😀" -- Not only human scripts, but also emojis!
"\128512"
This status quo is unsatisfactory for a number of reasons:
- Even though it's small, it somehow creates an unwelcoming atmosphere for native speakers of languages whose scripts are not representable in ASCII.
- This is an actual annoyance during debugging localized software, or strings with emojis.
- Following 1, Haskell teachers are forced to use other languages instead of the students' mother tongues, or relying on I/O functions like
putStrLn
, creating a rather unnecessary burden. - Other string types, like Text, rely on this
Show
instance.
Moreover, read
already can handle Unicode strings today, so relaxing constraints on show
doesn't affect read . show == id
.
By not escaping Unicode characters in the instance of Show
for String
, all of the above problems go away immediately, and other string libraries benefit as well.
Proposal
It's proposed here to change the Show
instance of String
, to achieve the following output:
ghci> print "Hello, 世界"
"Hello, 世界"
ghci> print "Hello, мир"
"Hello, мир"
ghci> print "Hello, κόσμος"
"Hello, κόσμος"
ghci> "Hello, 世界"
"Hello, 世界"
ghci> "😀"
"😀"
Concretely, it means:
- Remove a few guards in
showLitChar
(inGHC.Show
) to not escape Unicode characters (except \n, etc.). - Provide a function
showEscaped
, ornewtype Escaped = Escaped String
, to obtain the current escaping behavior, in case anyone wants that escaping back. -
ghci
should interceptString
outputs and escape any unreadable characters to avoid printing gibberish to the user's terminal.
Since Show
is usually only used in debugging and in development, this change should not affect any production system.
Alternative Options
-
Always use
putStrLn
. This is viable today but unsatisfactory, as it requiresstdout
, but in some cases thestdout
is not really accessible, e.g. Telegram or Discord eval bots. -
Make the
Show
instance escape unreadable strings. This increases robustness of the interaction betweenshow
andread
, but it creates a hard dependency on Unicode, and thus affects portability. -
Customize
ghci
instead.ghci
intercepts string outputs and check if they can be converted back to readable Unicode characters. This is another viable option. But theunescaping
approach described above is preferable, because in today's GHC Haskell,show
actually drops information from the strings, and the following wouldn't work as we'd like:
ghci> Prelude.putStrLn $ show "世界"
"\19990\30028" -- we'd like unescaped "世界" here