Skip to content

Replace manual string lexing

We currently have an ugly hack that manually lexes strings instead of utilizing Alex syntax: https://gitlab.haskell.org/ghc/ghc/-/blob/610840eb5bf6bd59417b82cc61e74aeaacbe5462/compiler/GHC/Parser/Lexer.x?page=3#L2184

The comment here suggests that it might duplicate work between the regex and the assembly of the token: https://gitlab.haskell.org/ghc/ghc/-/blob/610840eb5bf6bd59417b82cc61e74aeaacbe5462/compiler/GHC/Parser/Lexer.x?page=3#L663-666

But after some prototyping, it seems to work out rather nicely. Some advantages of avoiding our manual lexing:

  1. The lexer more closely follows the spec in the Haskell Report
  2. The logic is simplified, with the custom logic only needing to postprocess the character stream, and not worry about lexical errors
  3. It seems to reduce memory usage
Edited by Brandon Chinn
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information