Skip to content

Segmentation fault with non-moving GC and unordered-containers-0.19

TLDR

Using unordered-containers-0.19 with the non-moving GC triggers segmentation faults on GHC 9.4.7. But they don't trigger with unordered-containers-0.18.

Summary

Back when we updated to GHC 9.2.4 earlier this year, we started running into segmentation faults with the non-moving GC. Around the same time #22264 (closed) was reported, which we were confident was the same issue as what we were facing.

While waiting for a proper fix, we realized that the issue went away when we downgraded unordered-containers to 0.18.

When @bgamari's fixes for #22264 (closed) were merged we assumed this had been solved. I read through the patches and the changes in unordered-containers and assumed the problematic bit was: https://github.com/haskell-unordered-containers/unordered-containers/commit/b73381e434d23016e9fca1d2ac719ab48d6a0437 This patch introduced a usage of shrinkSmallMutableArray#, and the fix mentioned it. So, I tried to patch unordered-containers to not call this primop.

This workaround didn't fix the issue, but I didn't think much of it at the time, since we had a different easy workaround in the form of just keeping the old version.

When we upgraded to GHC 9.2.5, we didn't upgrade unordered-containers, so, since then we've still been on 0.18. We have since upgraded to GHC 9.4.7, which is the version we currently use.

Recently I decided to upgrade unordered-containers to 0.19.1, thinking the issue resolved. But this led to segmentation faults. Reverting the version bump made them go away.

So, there is something in this set of changes that is triggering a bug in the compiler: https://github.com/haskell-unordered-containers/unordered-containers/compare/v0.2.18.0...v0.2.19.0

Theory

Thankfully the difference between those two releases is quite small (thank you for having frequent releases). The main two suspects are: https://github.com/haskell-unordered-containers/unordered-containers/commit/b73381e434d23016e9fca1d2ac719ab48d6a0437#diff-b95c39ca3aa6cff994008c0aa367e1219dd7b5d87e5d062e0a23cc357c35a241 and https://github.com/haskell-unordered-containers/unordered-containers/commit/102d35b758daef5d35fc9310ec29c87661603df3.

I have already eliminated the first by previously confirming that the seg faults still occur if we don't use shrunkSmallMutableArray#.

So, my money is on the second option: https://github.com/haskell-unordered-containers/unordered-containers/commit/102d35b758daef5d35fc9310ec29c87661603df3

This commit does the following optimisation. When snoc-ing onto an array, instead of copying the array into an uninitialised larger array and then writing the last element, we instead initialise the larger array to be all the last element, and then copy the previous array over everything but the last element.

I can imagine that this would change how this code behaves wrt to the update remembered set.

Another hint is that I think this only occurs when the snoc-ed element is a THUNK. I know this since we also make heavy use of strict-containers-0.2, which exports a guaranteed strict version of unordered-containers-0.19.1 and that didn't seem to trigger the seg fault.

Steps to reproduce

No reproducer, but heavy HashMap use with the non-moving GC seems to trigger it. I'll try to create a reproducer soon.

Expected behavior

No segmentation faults

Environment

  • GHC version used: 9.4.7

Optional:

  • Operating System: Linux/Ubuntu
  • System Architecture: x86_64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information