Skip to content

Give better errors for code corrupted by Unicode smart quotes (#21843)

Lawton Nichols requested to merge lawtonnichols/ghc:T21843 into master

Implements the feature requested in #21843 (closed).

Consider the following file with smart quote errors:

module UnicodeSmartQuotes where

badString = hello

Previously, compiling this file resulted in the following error:

quotes.hs:3:13: error: lexical error at character 'h'
  |
3 | badString = “hello”
  |             ^

Now, compiling the file outputs the following:

quotes.hs:3:13: error: [GHC-31623]
    Unicode character '“' ('\8220') looks like '"' (Quotation Mark), but it is not
  |
3 | badString = “hello”
  |             ^

Changes to the lexer

  • Add a new lexical rule to match smart quotes and immediately fail
  • Change string/character literal lexing logic to look for smart quotes
  • Change escape character matching logic to look for smart quotes (e.g., an accidental \“ instead of \")

Closes #21843 (closed).

Edited by Lawton Nichols

Merge request reports