Skip to content
Snippets Groups Projects
Forked from Glasgow Haskell Compiler / GHC
Source project has a limited visibility.
Alec Theriault's avatar
Alec Theriault authored
The lexer hacks around unicode by squishing any character into a 'Word8'
and then storing the actual character in its state. This happens at
'alexGetByte'.

That is all and well, but we ought to be careful that the characters we
retrieve via 'alexInputPrevChar' also fit this convention.

In fact, #13986 exposes nicely what can go wrong: the regex in the left
context of the type application rule uses the '$idchar' character set
which relies on the unicode hack. However, a left context corresponds
to a call to 'alexInputPrevChar', and we end up passing full blown
unicode characters to '$idchar', despite it not being equipped to deal
with these.

Test Plan: Added a regression test case

Reviewers: austin, bgamari

Reviewed By: bgamari

Subscribers: rwbarton, thomie

GHC Trac Issues: #13986

Differential Revision: https://phabricator.haskell.org/D4105
821adee1
History
Name Last commit Last update