Skip to content

msys has bad Unicode support

Tests 3307 environment001 pass on Cygwin, Linux, fail on msys:

>    lib/IO                        3307 [bad exit code] (normal)
>    lib/IO                        environment001 [bad stdout] (normal)

Here is Max's diagnosis:

Basically, msys has kind of bad Unicode support. If you write a program "len.c" like this:

#include <windows.h>
#include <stdio.h>
#include <string.h>

int main(int _argc, char **_argv) {
	LPWSTR cmdLine = GetCommandLineW();

	int argc;
	LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);

	printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
	return 0;
}

Create a UTF-8 encoded file called "utf8" containing two characters:

不好

And then execute it like so:

gcc len.c && ./a.exe $(cat utf8)

(NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is an issue with the shells)

You get different results on msys and Cygwin:

  • On Cygwin, you get 2 wide characters in the first argument; i.e. the UTF-16 encoded Chinese text
  • On msys, you get 6 wide characters in the first argument; i.e. one 16-byte value for every byte in the UTF-8 encoded Chinese text

IMHO the msys behaviour is broken because the command line arguments supplied via the Windows API are meant to be UTF-16. It does match the behaviour of Windows cmd if you do this:

set /p myvar= < utf8
a.exe %myvar%

(You get "6 wide characters" printed)

Perhaps the issue in cmd stems from the fact that the Windows console is stuck in code page 850 and doesn't support the UTF-8 "code page". But msys really has no excuse since it reports itself as being UTF-8.

I'm not sure what to do here because I don't think our code actually has a problem, and the test does pass (and check something useful) in Linux, OS X and Cygwin. But still, something is not working quite right here. Perhaps just mark it as expect-fail in msys?

Trac metadata
Trac field Value
Version 7.2.1
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information