Skip to content

GHC Lexer reports error in string literal including ZWJ

Summary

GHC Lexer reports error in string literal including ZWJ.

Steps to reproduce

  1. Provide the following file.
main = do
    putStrLn "\128104\8205\128105\8205\128103\8205\128102"
    putStrLn "👨‍👩‍👧‍👦"
  1. Try to compile it and get a lexical error at ZWJ.
$ ghc foo.hs
[1 of 1] Compiling Main             ( foo.hs, foo.o )

foo.hs:3:16: error:
    lexical error in string/character literal at character '\8205'
  |
3 |     putStrLn "👨‍👩‍👧‍👦" 
  |                ^

Expected behavior

We expect same behavior on the both codes of

putStrLn "\128104\8205\128105\8205\128103\8205\128102"

and

putStrLn "👨‍👩‍👧‍👦"

without lexical error.

Environment

  • GHC version used: 8.10.7, 9.0.2 and 9.2.2

Optional:

  • Operating System: Ubuntu 20.04
  • System Architecture: x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information