Windows compatibility settings cause segfault in GetProcAddress?
Summary
On Windows, stack has been segfaulting when it is called by IntelliJ-Haskell, which is an IDE/plugin.
Windows seems to be setting some backwards compatibility mechanism automatically. With that active, the crash appears to happen when the GHC RTS calls the Windows API. Specifically, the crash appears to happen inside the implementation of Windows' GetProcAddress
, not in GHC controlled code.
Whilst this feels like a bug in Windows' compatibility mechanism, perhaps the workaround should be to drop Vista support.
More details follow.
Steps to reproduce
In PowerShell on Windows:
$env:__COMPAT_LAYER="DetectorsAppHealth"
stack --numeric-version
Actual behavior
2.1.3
Access violation in generated code when reading 0000000094cdeebe
Expected behavior
2.1.3
More details
It appears the operating system—Windows Defender has been mooted—is setting the __COMPAT_LAYER
environmental variable. See the IntelliJ-Haskell issue. The environmental variable is used for Windows' backwards compatibility mechanism; Windows tweaks its API behaviour, etc., to match previous version of Windows.
Now, Stack embeds the GHC RTS. The RTS calls the Windows API GetProcAddress
, asking for the address of "GetActiveProcessorCount"
. That corresponds to this section of the GHC runtime. When the environmental variable is set as above, the access violation seems to happen inside the internals of GetProcAddress, namely inside ntdll.dll
.
Native code debugging seems trixie on Windows, but I came to this conclusion using a mixture of drltrace
and the Visual Studio debugger. drltrace can be obtained by installing Dr. Memory.
Now, a crash inside of GetProcAddress sounds serious. Feels like a Windows bug? But it might be easier for GHC to work around it. Or maybe—and I hope not because this would be harder to diagnose—some stack or heap corruption is happening earlier on, but hidden and symptomless until GetProcAddress is called.
But what I can say is this bug manifests during that call to GetProcAddress. But why even is GHC inserting calls to GetProcAddress
? The comment in the RTS says it is because “We still support Windows Vista”. I find it “interesting” then that this compatibility code appears to interact badly with Windows' __COMPAT_LAYER
backwards compatibility mechanism.
Maybe the answer is to link directly to GetActiveProcessorCount
, eschewing GetProcAddress
. That would require dropping support for Windows Vista, although Windows Vista is no longer (security) supported by Microsoft. Thoughts?
Finally, note the error message here is similar to #13112. However, my best guess is that this is a different issue. That's because the cause sounds quite different. Also, for this issue here, the address of the access violation seems completely repeatable and different to the other bug. Of course, this is just an educated guess and I could be mistaken.
Environment
- GHC version used: GHC 8.2.2 (I think)
- Stack version: 2.1.3
- Operating System: Windows 10 (Build 19035.1, but probably also previous versions)
- System Architecture: x86_64