Segmentation fault with non-moving GC and unordered-containers-0.19
TLDR
Using unordered-containers-0.19
with the non-moving GC triggers segmentation faults on GHC 9.4.7.
But they don't trigger with unordered-containers-0.18
.
Summary
Back when we updated to GHC 9.2.4 earlier this year, we started running into segmentation faults with the non-moving GC. Around the same time #22264 was reported, which we were confident was the same issue as what we were facing.
While waiting for a proper fix, we realized that the issue went away when we downgraded unordered-containers
to 0.18.
When @bgamari's fixes for #22264 were merged we assumed this had been solved.
I read through the patches and the changes in unordered-containers and assumed the problematic bit was: https://github.com/haskell-unordered-containers/unordered-containers/commit/b73381e434d23016e9fca1d2ac719ab48d6a0437
This patch introduced a usage of shrinkSmallMutableArray#
, and the fix mentioned it. So, I tried to patch unordered-containers
to not call this primop.
This workaround didn't fix the issue, but I didn't think much of it at the time, since we had a different easy workaround in the form of just keeping the old version.
When we upgraded to GHC 9.2.5, we didn't upgrade unordered-containers
, so, since then we've still been on 0.18
.
We have since upgraded to GHC 9.4.7, which is the version we currently use.
Recently I decided to upgrade unordered-containers
to 0.19.1
, thinking the issue resolved. But this led to segmentation faults. Reverting the version bump made them go away.
So, there is something in this set of changes that is triggering a bug in the compiler: https://github.com/haskell-unordered-containers/unordered-containers/compare/v0.2.18.0...v0.2.19.0
Theory
Thankfully the difference between those two releases is quite small (thank you for having frequent releases). The main two suspects are: https://github.com/haskell-unordered-containers/unordered-containers/commit/b73381e434d23016e9fca1d2ac719ab48d6a0437#diff-b95c39ca3aa6cff994008c0aa367e1219dd7b5d87e5d062e0a23cc357c35a241 and https://github.com/haskell-unordered-containers/unordered-containers/commit/102d35b758daef5d35fc9310ec29c87661603df3.
I have already eliminated the first by previously confirming that the seg faults still occur if we don't use shrunkSmallMutableArray#
.
So, my money is on the second option: https://github.com/haskell-unordered-containers/unordered-containers/commit/102d35b758daef5d35fc9310ec29c87661603df3
This commit does the following optimisation. When snoc-ing onto an array, instead of copying the array into an uninitialised larger array and then writing the last element, we instead initialise the larger array to be all the last element, and then copy the previous array over everything but the last element.
I can imagine that this would change how this code behaves wrt to the update remembered set.
Another hint is that I think this only occurs when the snoc-ed element is a THUNK.
I know this since we also make heavy use of strict-containers-0.2
, which exports a guaranteed strict version of unordered-containers-0.19.1
and that didn't seem to trigger the seg fault.
Steps to reproduce
No reproducer, but heavy HashMap use with the non-moving GC seems to trigger it. I'll try to create a reproducer soon.
Expected behavior
No segmentation faults
Environment
- GHC version used: 9.4.7
Optional:
- Operating System: Linux/Ubuntu
- System Architecture: x86_64