Skip to content

GHC cannot lex string literals that contain zero-width-joiner

Summary

GHC's lexer fails to parse string literals that make use of the zero-width-joiner unicode codepoint, eg, certain emojis. I think this is probably caused by isAny in compiler/GHC/Parser/Lexer.x returning False for the ZWJ.

Steps to reproduce

Try to compile the following file:

main = putStrLn "🏳️‍🌈"

You will get the following error

Test.hs:1:17: error:
    lexical error in string/character literal at character '\8205'
  |
1 | main = putStrLn "🏳️🌈"
  |   

Expected behavior

GHC should be able to compile this.

Environment

  • GHC version used: 9.2.4, 9.4.4
Edited by Teo Camarasu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information