Lexer does not handle unicode numeric subscripts
Hi all,
I would fix this myself but the GHC Lexer looks rather fragile and I'd be afraid of breaking something. I can have a crack at it and write a patch if you like.
Currently GHC rejects perfectly good unicode identifier characters (numeric subscripts):
For example, the following expression:
let v₂ = (+) in v₂ 1 3
gives:
lexical error at character '\8322'
The subscripts are in the "!OtherNumber" general unicode category, so I'm pretty sure the main change is to Lexer.x, changing:
OtherNumber -> other_graphic
To some other category (in the definition of alexGetChar).
The main issue I see here is that we can't just change "other_graphic" to "digit" - it would have to be like ' or _ rather than digit or it would become acceptable to use these for real numeric digits, which I don't think we want.
Seeing as I am not confident enough in GHC's lexer/parser structure to make these changes, I was wondering if anyone who is more experienced who has the time could do it.