Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
GHC
GHC
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,323
    • Issues 4,323
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 388
    • Merge Requests 388
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #8524

Closed
Open
Opened Nov 12, 2013 by oerjan@nvg.ntnu.no@trac-oerjan

GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals

GHC is inconsistent with the Haskell Report on which Unicode characters are allowed in string and character literals. (And I don't like either option, why leave out any characters in strings unnecessarily?)

Examples from ghci 7.6.3 (also tested in lambdabot on irc):

Prelude> "​" -- Unicode char \8203, Format class.

<interactive>:10:2:
    lexical error in string/character literal at character '\8203'
Prelude> " " -- Unicode char \8202, Space class.
"\8202"
Prelude> "t\ \est" -- Unicode char \8202 in a string gap.

<interactive>:14:4:
    lexical error in string/character literal at character '\8202'

My reading of http://www.haskell.org/onlinereport/haskell2010/haskellch2.html (section 2.2 and 2.6):

  • The report BNF token "graphic", which can be used in literals, includes indirectly many Unicode classes, but uniWhite is not one of them. Thus the only Unicode whitespace allowed to represent itself in literals is ASCII space.
  • Unicode formatting characters are not mentioned in the BNF that I can see, so are not allowed in literals.
  • String gaps are made out of the report BNF token whitespace, which does include uniWhite.

Who wants what:

GHC Report Me
Format in string No No Yes
Space/uniWhite in string Yes No Yes
Space/uniWhite in string gap No Yes Dunno

In short, GHC's behavior is buggy and/or annoying in two opposite ways:

  • It leaves out some Unicode characters as allowable in strings and character literals, presumably because the report says so.
  • It allows some characters the report says it ''shouldn't*, and refuses some characters the report says it *should''.
Trac metadata
Trac field Value
Version 7.6.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: ghc/ghc#8524