GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals
GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals. (And I don't like either option, why leave out any characters in strings unnecessarily?)
Examples from ghci 7.6.3 (also tested in lambdabot on irc):
Prelude> "" -- Unicode char \8203, Format class. <interactive>:10:2: lexical error in string/character literal at character '\8203' Prelude> " " -- Unicode char \8202, Space class. "\8202" Prelude> "t\ \est" -- Unicode char \8202 in a string gap. <interactive>:14:4: lexical error in string/character literal at character '\8202'
My reading of http://www.haskell.org/onlinereport/haskell2010/haskellch2.html (section 2.2 and 2.6):
- The report BNF token "graphic", which can be used in literals, includes indirectly many Unicode classes, but uniWhite is not one of them. Thus the only Unicode whitespace allowed to represent itself in literals is ASCII space.
- Unicode formatting characters are not mentioned in the BNF that I can see, so are not allowed in literals.
- String gaps are made out of the report BNF token whitespace, which does include uniWhite.
Who wants what:
|Format in string||No||No||Yes|
|Space/uniWhite in string||Yes||No||Yes|
|Space/uniWhite in string gap||No||Yes||Dunno|
In short, GHC's behavior is buggy and/or annoying in two opposite ways:
- It leaves out some Unicode characters as allowable in strings and character literals, presumably because the report says so.
- It allows some characters the report says it ''shouldn't*, and refuses some characters the report says it *should''.