msys has bad Unicode support
Tests 3307 environment001 pass on Cygwin, Linux, fail on msys:
> lib/IO 3307 [bad exit code] (normal)
> lib/IO environment001 [bad stdout] (normal)
Here is Max's diagnosis:
Basically, msys has kind of bad Unicode support. If you write a program "len.c" like this:
#include <windows.h>
#include <stdio.h>
#include <string.h>
int main(int _argc, char **_argv) {
LPWSTR cmdLine = GetCommandLineW();
int argc;
LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);
printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
return 0;
}
Create a UTF-8 encoded file called "utf8" containing two characters:
不好
And then execute it like so:
gcc len.c && ./a.exe $(cat utf8)
(NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is an issue with the shells)
You get different results on msys and Cygwin:
- On Cygwin, you get 2 wide characters in the first argument; i.e. the UTF-16 encoded Chinese text
- On msys, you get 6 wide characters in the first argument; i.e. one 16-byte value for every byte in the UTF-8 encoded Chinese text
IMHO the msys behaviour is broken because the command line arguments supplied via the Windows API are meant to be UTF-16. It does match the behaviour of Windows cmd if you do this:
set /p myvar= < utf8
a.exe %myvar%
(You get "6 wide characters" printed)
Perhaps the issue in cmd stems from the fact that the Windows console is stuck in code page 850 and doesn't support the UTF-8 "code page". But msys really has no excuse since it reports itself as being UTF-8.
I'm not sure what to do here because I don't think our code actually has a problem, and the test does pass (and check something useful) in Linux, OS X and Cygwin. But still, something is not working quite right here. Perhaps just mark it as expect-fail in msys?
Trac metadata
| Trac field | Value |
|---|---|
| Version | 7.2.1 |
| Type | Bug |
| TypeOfFailure | OtherFailure |
| Priority | normal |
| Resolution | Unresolved |
| Component | Compiler |
| Test case | |
| Differential revisions | |
| BlockedBy | |
| Related | |
| Blocking | |
| CC | |
| Operating system | |
| Architecture |