Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,842
    • Issues 4,842
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 457
    • Merge requests 457
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #10762

Closed
Open
Created Aug 09, 2015 by Michael Snoyman@snoyberg

On Windows, out-of-codepage characters can cause GHC build to fail

You can see where this hit us recently on stack with issues 738 and 734. To demonstrate, I'm attaching a UTF-8 encoded Haskell program with some Hebrew characters, and some warnings. The contents of that file are:

module Main
    ( main
    , שלום
    ) where

main :: IO ()
main = putStrLn שלום

שלום = "shalom"

If I first set my codepage to 65001 (UTF-8), everything works as expected:

C:\Users\Michael\Desktop>chcp 65001
Active code page: 65001

C:\Users\Michael\Desktop>ghc -fforce-recomp -Wall -ddump-hi -ddump-to-file shalom.hs
[1 of 1] Compiling Main             ( shalom.hs, shalom.o )

shalom.hs:9:1: Warning:
    Top-level binding with no type signature: שלום :: [Char]
Linking shalom.exe ...

However, if I set my codepage to 437 (US), both the warnings sent to the console, and the .hi dump file, cause GHC to exit prematurely:

C:\Users\Michael\Desktop>chcp 437
Active code page: 437

C:\Users\Michael\Desktop>ghc -fforce-recomp -Wall shalom.hs
[1 of 1] Compiling Main             ( shalom.hs, shalom.o )

shalom.hs:9:1: Warning:
    Top-level binding with no type signature: <stderr>: commitBuffer: invalid argument (invalid character)
C:\Users\Michael\Desktop>chcp 437
Active code page: 437

C:\Users\Michael\Desktop>ghc -fforce-recomp -ddump-hi -ddump-to-file shalom.hs
[1 of 1] Compiling Main             ( shalom.hs, shalom.o )
shalom.dump-hi: commitBuffer: invalid argument (invalid character)

At the very least, I would argue that -ddump-to-file should always dump to the output files as UTF-8, as this is the most useful for tooling. Beyond that, there are a few options here:

  • Have all output- including to the console- go out as UTF-8. This may not play terribly nicely with consoles without setting the output codepage.
  • Provide a command line option or environment variable to specify "output as UTF-8."
  • More radical: change the default way that all Handles work so that UTF-8 is the default, instead of paying attention to code pages and environment variables. Honestly, this is my preference, but it's a bigger discussion than this one bug.

The workaround we've implemented in stack for now is setting the codepage to 65001 for the console while running stack. This is not ideal, since this is essentially a global setting for the entire console.

Trac metadata
Trac field Value
Version 7.10.2
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking