GHC cannot lex string literals that contain zero-width-joiner
Summary
GHC's lexer fails to parse string literals that make use of the zero-width-joiner unicode codepoint, eg, certain emojis.
I think this is probably caused by isAny
in compiler/GHC/Parser/Lexer.x
returning False
for the ZWJ.
Steps to reproduce
Try to compile the following file:
main = putStrLn "🏳️🌈"
You will get the following error
Test.hs:1:17: error:
lexical error in string/character literal at character '\8205'
|
1 | main = putStrLn "🏳️🌈"
|
Expected behavior
GHC should be able to compile this.
Environment
- GHC version used: 9.2.4, 9.4.4