Commit 9c9adbd0 authored by Oleg Grenrus's avatar Oleg Grenrus Committed by Marge Bot

Implement proposal 403: Lexer cleanup

This allows Other Numbers to be used in identifiers, and also documents
other, already existing lexer divergence from Haskell Report
parent 918d5021
Pipeline #33967 canceled with stages
in 32 seconds
......@@ -131,7 +131,7 @@ $tab = \t
$ascdigit = 0-9
$unidigit = \x03 -- Trick Alex into handling Unicode. See [Unicode in Alex].
$decdigit = $ascdigit -- for now, should really be $digit (ToDo)
$decdigit = $ascdigit -- exactly $ascdigit, no more no less.
$digit = [$ascdigit $unidigit]
$special = [\(\)\,\;\[\]\`\{\}]
......@@ -147,17 +147,17 @@ $unismall = \x02 -- Trick Alex into handling Unicode. See [Unicode in Alex].
$ascsmall = [a-z]
$small = [$ascsmall $unismall \_]
$uniidchar = \x07 -- Trick Alex into handling Unicode. See [Unicode in Alex].
$idchar = [$small $large $digit $uniidchar \']
$unigraphic = \x06 -- Trick Alex into handling Unicode. See [Unicode in Alex].
$graphic = [$small $large $symbol $digit $special $unigraphic \"\']
$graphic = [$small $large $symbol $digit $idchar $special $unigraphic \"\']
$binit = 0-1
$octit = 0-7
$hexit = [$decdigit A-F a-f]
$uniidchar = \x07 -- Trick Alex into handling Unicode. See [Unicode in Alex].
$idchar = [$small $large $digit $uniidchar \']
$pragmachar = [$small $large $digit]
$pragmachar = [$small $large $digit $uniidchar ]
$docsym = [\| \^ \* \$]
......@@ -2521,7 +2521,7 @@ adjustChar c = fromIntegral $ ord adj_c
SpacingCombiningMark -> other_graphic
EnclosingMark -> other_graphic
DecimalNumber -> digit
LetterNumber -> other_graphic
LetterNumber -> digit
OtherNumber -> digit -- see #4373
ConnectorPunctuation -> symbol
DashPunctuation -> symbol
......
......@@ -93,6 +93,27 @@ Lexical syntax
See `GHC Proposal #229 <https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0229-whitespace-bang-patterns.rst>`__
for the precise rules.
- Haskell Report allows any Unicode Decimal Number in decimal literals.
However, GHC accepts only ASCII numbers::
ascDigit → 0 | 1 | … | 9
decimal → ascDigit {ascDigit}
- GHC is more lenient in which characters are allowed in the identifiers.
Unicode Other Letters are considered to be small letters,
therefore variable identifiers can begin with them.
Digit class contains all Unicode numbers instead of just Decimal Numbers.
Modifier Letters and Non-Spacing Marks can appear in the tail
of the identifiers.::
uniSmall → any Unicode Lowercase Letter or Other Letter
uniDigit → any Unicode Decimal Number, Letter Number or Other Number
uniIdchar → any Unicode Modifier Letter or Non-Spacing Mark
idchar → small | large | digit | uniIdchar | '
varid → small {idchar} ⟨reservedid⟩
conid → large {idchar}
.. _infelicities-syntax:
......
main = print nⅯⅯⅩⅩ
where nⅯⅯⅩⅩ = 11
-- ⅯⅯⅩⅩ is characters are in NumberLetter unicode category.
-- We now allow it to be used in identifiers, but they
-- are not lower or upper, so cannot be the first one.
--
-- Just like 'OtherNumber' (#4373), 'ModifierLetter' (#10196) and
-- NonSpacingMark (#7650).
--
-- > map generalCategory "ⅯⅯⅩⅩ"
-- [LetterNumber,LetterNumber,LetterNumber,LetterNumber]
--
-- > map show "ⅯⅯⅩⅩ"
-- ["'\\8559'","'\\8559'","'\\8553'","'\\8553'"]
main = print ⅯⅯⅩⅩ
where ⅯⅯⅩⅩ = 11
-- ⅯⅯⅩⅩ is characters are in NumberLetter unicode category.
-- We now allow it to be used in identifiers, but they
-- are not lower or upper, so cannot be the first one.
--
-- Just like 'OtherNumber' (#4373), 'ModifierLetter' (#10196) and
-- NonSpacingMark (#7650).
--
-- > map generalCategory "ⅯⅯⅩⅩ"
-- [LetterNumber,LetterNumber,LetterNumber,LetterNumber]
--
-- > map show "ⅯⅯⅩⅩ"
-- ["'\\8559'","'\\8559'","'\\8553'","'\\8553'"]
T18158b.hs:1:14: error: lexical error at character '\8559'
......@@ -30,3 +30,6 @@ test('T7650', normal, compile, [''])
test('brackets', normal, compile, [''])
test('T18225A', normal, compile, [''])
test('T18225B', normal, compile_fail, [''])
test('T18158', normal, compile, [''])
test('T18158b', normal, compile_fail, [''])
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment