I have program which crashes reliably with both GHC 8.4 & 8.6: coin-node.hs
Steps to reproduce
To compile use: ghc --make -O1 -debug -rtsopts ./exe/coin-node.hs Crash happens if -O1/-O2 is specified. With -O0 program finish normally
Run with ./coin-node +RTS -c -V0 -i0 -DS -A256k. Both -c and -A flags are necessary. Either removing -c or increasing -A makes crash go away so it's clearly related to GC somehow.
-dcore-lint shows no errors
Low level code comes from memory package. Crash itself is deterministic and always happens in the same place. However test case is fragile and depends on forcing values in particular order. For example removing bang from pubK makes crash go away
Expected behavior
I expect programs compiled by GHC to not crash unless one gets way too playful with bits.
Core dump analysis
Crash happens in comparing two MutableByteArray# wrappers for equality:
Very suspicious thing is load from rax+7. Indeed if we load from rax + 8 we'll get correct value. My only hypothesis is that rax expected to contain tagged pointer but that tag got lost somehow. rax = 0x42000bf708 so all tag bits are clear.
Environment
GHC version used: 8.4.4, 8.6.4, 8.6.5
Operating System: Linux
System Architecture: x64
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
I suspect this is a bug in compacting GC (-c), which I think is not widely used and not tested well.
I was probably wrong about this -- we enable compacting GC in oldest generation when the residency of the generation exceeds 30% (by default), so I think we should make sure compacting GC is stable.
Main.main :: GHC.Types.IO ()[GblId] = \u [] src<Main.hs:(75,1)-(81,10)> src<Main.hs:76:15-34> src<Main.hs:(67,1)-(72,27)> case Main.$wgo 1# 10000# of { (#,#) ww1_s5vf [Occ=Once!] ww2_s5vg [Occ=Once] -> case ww1_s5vf of pubK_s5vh [Occ=Once] { Main.Bytes ipv_s5vi -> src<Main.hs:81:3-10> src<Main.hs:24:5-52> let { sat_s5w5 [Occ=Once] :: [GHC.Types.Char] [LclId] = \u [] let { Rec { go_s5vj [Occ=LoopBreaker] :: [Main.Bytes] -> [Main.Bytes] [LclId, Arity=1, Str=<S,1*U>, Unf=OtherCon []] = \r [ds_s5vk] case ds_s5vk of { [] -> [] []; : y_s5vm [Occ=Once!] ys_s5vn [Occ=Once*] -> src<Main.hs:79:21-35> src<Main.hs:(37,1)-(53,41)> case y_s5vm of wild1_s5vo [Occ=Once] { Main.Bytes m1_s5vp -> case sizeofMutableByteArray# [m1_s5vp] of len_s5vq { __DEFAULT -> src<Main.hs:41:5-38> src<Main.hs:41:13-38> src<Main.hs:79:32-35> src<Main.hs:38:7-28> src<Main.hs:42:5-38> src<Main.hs:42:13-38> case sizeofMutableByteArray# [ipv_s5vi] of sat_s5vr [Occ=Once] { __DEFAULT -> case /=# [len_s5vq sat_s5vr] of { __DEFAULT -> src<Main.hs:39:7-15> src<Main.hs:39:32-69> case case noDuplicate# [GHC.Prim.realWorld#] of s'_s5vt [Occ=Once] { __DEFAULT -> src<Main.hs:39:50-69> let-no-escape { Rec { loop_s5vu [Occ=LoopBreakerT[2]] :: GHC.Prim.Int# -> GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, GHC.Types.Bool #) [LclId[JoinId(2)], Arity=2, Str=<L,U><L,U>, Unf=OtherCon []] = \r [i_s5vv s1_s5vw] src<Main.hs:(44,5)-(53,41)> src<Main.hs:45:9-27> case ==# [i_s5vv len_s5vq] of { __DEFAULT -> src<Main.hs:46:9-17> src<Main.hs:(47,11)-(53,41)> case readWord8Array# [m1_s5vp i_s5vv s1_s5vw] of { (#,#) ipv1_s5vz [Occ=Once] ipv2_s5vA [Occ=Once] -> src<Main.hs:(49,15)-(53,41)> case readWord8Array# [ipv_s5vi i_s5vv ipv1_s5vz] of { (#,#) ipv3_s5vC [Occ=Once*] ipv4_s5vD [Occ=Once] -> src<Main.hs:(51,19)-(53,41)> case eqWord# [ipv2_s5vA ipv4_s5vD] of { __DEFAULT -> src<Main.hs:53:26-41> (#,#) [ipv3_s5vC GHC.Types.False]; 1# -> src<Main.hs:52:26-43> case +# [i_s5vv 1#] of sat_s5vF [Occ=Once] { __DEFAULT -> loop_s5vu sat_s5vF ipv3_s5vC; }; }; }; }; 1# -> src<Main.hs:45:31-43> (#,#) [s1_s5vw GHC.Types.True]; }; end Rec } } in loop_s5vu 0# s'_s5vt; } of { (#,#) _ [Occ=Dead] ipv2_s5vI [Occ=Once!] -> case ipv2_s5vI of { GHC.Types.False -> go_s5vj ys_s5vn; GHC.Types.True -> let { sat_s5vK [Occ=Once] :: [Main.Bytes] [LclId] = \u [] go_s5vj ys_s5vn; } in src<Main.hs:77:21-22> : [wild1_s5vo sat_s5vK]; }; }; 1# -> src<Main.hs:38:32-36> go_s5vj ys_s5vn; }; }; }; }; }; end Rec } } in src<Main.hs:24:33-52> src<Main.hs:(57,1)-(62,16)> src<Main.hs:81:9-10> src<Main.hs:(77,19)-(80,19)> let { sat_s5vL [Occ=Once] :: [Main.Bytes] [LclId] = CCCS :! [pubK_s5vh ww2_s5vg]; } in case GHC.List.reverse1 sat_s5vL GHC.Types.[] of sat_s5vM [Occ=Once] { __DEFAULT -> case go_s5vj sat_s5vM of { [] -> Main.main1; : k1_s5vP [Occ=Once!] _ [Occ=Dead] -> case k1_s5vP of { Main.Bytes mba_s5vS -> src<Main.hs:58:5-41> case sizeofMutableByteArray# [mba_s5vS] of { __DEFAULT -> src<Main.hs:59:5-13> src<Main.hs:(59,17)-(62,16)> case case noDuplicate# [GHC.Prim.realWorld#] of s'_s5vU [Occ=Once] { __DEFAULT -> src<Main.hs:(59,35)-(62,16)> case readWord8Array# [mba_s5vS 0# s'_s5vU] of { (#,#) ipv1_s5vW [Occ=Once] ipv2_s5vX [Occ=Once] -> src<Main.hs:62:7-16> src<Main.hs:61:49-71> case word2Int# [ipv2_s5vX] of sat_s5vY [Occ=Once] { __DEFAULT -> case chr# [sat_s5vY] of sat_s5vZ [Occ=Once] { __DEFAULT -> let { sat_s5w0 [Occ=Once] :: GHC.Types.Char [LclId] = CCCS GHC.Types.C#! [sat_s5vZ]; } in let { sat_s5w1 [Occ=Once] :: [GHC.Types.Char] [LclId] = CCCS :! [sat_s5w0 GHC.Types.[]]; } in src<Main.hs:61:44-46> (#,#) [ipv1_s5vW sat_s5w1]; }; }; }; } of { (#,#) _ [Occ=Dead] ipv2_s5w4 [Occ=Once] -> GHC.Show.showLitString ipv2_s5w4 Main.$fShowBytes2; }; 0# -> src<Main.hs:58:45-46> Main.$fShowBytes1; }; }; }; }; } in let { sat_s5w6 [Occ=Once] :: GHC.Base.String [LclId] = CCCS :! [GHC.Show.$fShow(,)3 sat_s5w5]; } in GHC.IO.Handle.Text.hPutStr' GHC.IO.Handle.FD.stdout sat_s5w6 GHC.Types.True; }; };
If I add a watchpoint to ((StgClosure*)0x42000d37b0)->header.info to see how
it was allocated it shows that it was allocated in compacting GC:
>>> bt#0 0x000000000056c234 in move (to=0x42000d37b8, from=0x42000fddf8, size=2) at rts/sm/Compact.c:191#1 0x000000000056d4a4 in update_bkwd_compact (gen=0x1a6e4d8) at rts/sm/Compact.c:889#2 0x000000000056d8f7 in compact (static_objects=0x5c1c39, dead_weak_ptr_list=0x7ffc1444e4d8, resurrected_threads=0x7ffc1444e4e0) at rts/sm/Compact.c:1004#3 0x00000000005646f8 in GarbageCollect (collect_gen=1, do_heap_census=false, gc_type=0, cap=0x6d84c0 <MainCapability>, idle_cap=0x0) at rts/sm/GC.c:476#4 0x00000000005476f6 in scheduleDoGC (pcap=0x7ffc1444e648, task=0x1a7fe20, force_major=false) at rts/Schedule.c:1806#5 0x0000000000546c29 in schedule (initialCapability=0x6d84c0 <MainCapability>, task=0x1a7fe20) at rts/Schedule.c:550#6 0x00000000005480eb in scheduleWaitThread (tso=0x4200005388, ret=0x0, pcap=0x7ffc1444e750) at rts/Schedule.c:2547#7 0x0000000000558830 in rts_evalLazyIO (cap=0x7ffc1444e750, p=0x59e450, ret=0x0) at rts/RtsAPI.c:530#8 0x000000000055927f in hs_main (argc=1, argv=0x7ffc1444e948, main_closure=0x59e450, rts_config=...) at rts/RtsMain.c:72#9 0x000000000024cf46 in main ()
Original closure (second argument to move) is this:
>>> p printClosure(p)0x42000fddf0: FUN/1(0x24c860, 0x4200006000)
I wonder if this is a bug in tagging in compacting GC (maybe in
thread/unthread).
Omer and I discussed this briefly; it sounds to me like we may be losing a tag during threading. Namely, threading assumes that all pointers have the same tag. However, this in general is not true: some pointers may lack a tag (see, e.g., #15155 (closed)). One approach to try may be to teach the closure threading logic to always use non-zero tags when it finds them.
Confirmed that this is a problem with the assumption in mark-compact GC of all
pointers to a closure being tagged the same. This assumption currently does not
hold.
If with the commit linked above I build it with -debug the assertion I added
fails:
$ ./Main +RTS -c -A128kthread: pointer tag: 0, iptr tag: 0, closure type: TSO, info ptr: 0x582cf8thread: pointer tag: 0, iptr tag: 0, closure type: TSO, info ptr: 0x582cf8thread: pointer tag: 0, iptr tag: 0, closure type: TSO, info ptr: 0x582cf8thread: pointer tag: 1, iptr tag: 0, closure type: CONSTR_1_0, info ptr: 0x24cde8Main: internal error: ASSERTION FAILED: file rts/sm/Compact.c, line 105 (GHC version 8.9.0.20190911 for x86_64_unknown_linux) Please report this as a GHC bug: https://www.haskell.org/ghc/reportabugzsh: abort (core dumped) ./Main +RTS -c -A128k
So we see a pointer which is tagged 1, but there's already a chain in the
location of the pointed closure, and the info pointer is tagged 0. So when we
"unthread" this chain we'll end up losing the original tag 1.
If I disable the assertion but keep printing I see these lines before the
segfault:
These are all the cases in this program where turn tagged pointers into untagged
pointers, and this happens in 3rd collection (which is the only major collection
and so only time we run compaction before the segfault). The segfault happens
after 6th collection.
Looking at each object right before the 3rd GC:
0x596380 is in payload of a static indirection:
>>> p printClosure(0x596378)0x596378: IND_STATIC(0x42000fd041)
0xd5cc90 is in stable pointer table.
0x42000fdca8 is on a stack.
If I continue, right before the segfault this is the frame on top of the stack:
stk[2] is the buggy value that we pushed the stack -- the correct value is
0x4200006000:
>>> p printClosure(0x4200006000)0x4200006000: ARR_WORDS("12297829382473034410")
If I watch this stack location I see that the value is pushed to the stack from
payload[0] of a FUN_1_0:
>>> p printClosure(0x42000d37b0)0x42000d37b0: FUN/1(0x24c830, 0x4200006000)
This function is copied in:
>>> bt#0 move (to=0x42000d37b0, from=0x42000fddf0, size=2) at rts/sm/Compact.c:192#1 0x0000000000565e01 in update_bkwd_compact (gen=0xd5e4d8) at rts/sm/Compact.c:872#2 0x0000000000566254 in compact (static_objects=0x5b9201, dead_weak_ptr_list=0x7ffc3b4674a8, resurrected_threads=0x7ffc3b4674b0) at rts/sm/Compact.c:987#3 0x000000000055cf25 in GarbageCollect (collect_gen=1, do_heap_census=false, gc_type=0, cap=0x6ce4c0 <MainCapability>, idle_cap=0x0) at rts/sm/GC.c:476#4 0x000000000053ff0e in scheduleDoGC (pcap=0x7ffc3b467618, task=0xd6fe20, force_major=false) at rts/Schedule.c:1806#5 0x000000000053f441 in schedule (initialCapability=0x6ce4c0 <MainCapability>, task=0xd6fe20) at rts/Schedule.c:550#6 0x0000000000540903 in scheduleWaitThread (tso=0x4200005388, ret=0x0, pcap=0x7ffc3b467720) at rts/Schedule.c:2547#7 0x0000000000551040 in rts_evalLazyIO (cap=0x7ffc3b467720, p=0x596450, ret=0x0) at rts/RtsAPI.c:530#8 0x0000000000551a8f in hs_main (argc=1, argv=0x7ffc3b467918, main_closure=0x596450, rts_config=...) at rts/RtsMain.c:72#9 0x000000000024cef6 in main ()
The stack that contains tagged reference to this function:
stk[16] has the correctly tagged FUN_0_1 value which we turn into an untagged
value in its new location (which is causing generated code to load a value from
its payload using an incorrect offset):
Where is this untagged reference to this FUN_0_1 coming from? Right before
compaction if I search for untagged references to this function I see that it's
in 0x42000e1010, which is a mut_list:
Old value = (StgClosure *) 0x42000fddf0 -- reverse execution so this is the new valueNew value = (StgClosure *) 0xaaaaaaaaaaaaaaaa0x000000000056124a in recordMutableGen_GC (p=0x42000fddf0, gen_no=1) at rts/sm/GCUtils.h:7070 *bd->free++ = (StgWord)p;>>> bt#0 0x000000000056124a in recordMutableGen_GC (p=0x42000fddf0, gen_no=1) at rts/sm/GCUtils.h:70#1 0x0000000000563304 in scavenge_mark_stack () at rts/sm/Scav.c:1202#2 0x0000000000564513 in scavenge_loop () at rts/sm/Scav.c:2114#3 0x000000000055e1f3 in scavenge_until_all_done () at rts/sm/GC.c:1100#4 0x000000000055ce93 in GarbageCollect (collect_gen=1, do_heap_census=false, gc_type=0, cap=0x6ce4c0 <MainCapability>, idle_cap=0x0) at rts/sm/GC.c:425#5 0x000000000053ff0e in scheduleDoGC (pcap=0x7ffc3b467618, task=0xd6fe20, force_major=false) at rts/Schedule.c:1806#6 0x000000000053f441 in schedule (initialCapability=0x6ce4c0 <MainCapability>, task=0xd6fe20) at rts/Schedule.c:550#7 0x0000000000540903 in scheduleWaitThread (tso=0x4200005388, ret=0x0, pcap=0x7ffc3b467720) at rts/Schedule.c:2547#8 0x0000000000551040 in rts_evalLazyIO (cap=0x7ffc3b467720, p=0x596450, ret=0x0) at rts/RtsAPI.c:530#9 0x0000000000551a8f in hs_main (argc=1, argv=0x7ffc3b467918, main_closure=0x596450, rts_config=...) at rts/RtsMain.c:72#10 0x000000000024cef6 in main ()
So we push this closure to the mark stack, and when scavenging the mark stack we
add it to the mut_list. Objects in the mut_list are normally not tagged, but
when we thread the roots we end up chaining these untagged pointers in the
mut_lists to the objects, and then we later chain references from mutators to
the same objects, which are tagged. This invalidates the assumption that all
references must be tagged the same!
The problem with tagging pointers in roots is that the RTS will have to unmask before using them. For that we'll have to painfully review all of RTS code and add masking ...
I really don't think this will be so bad. The roots are consumed by evacuate, which already handles tagging correctly. What do you expect to be problematic here?
The roots are consumed by evacuate, which already handles tagging correctly
evacuate when called on a root will find a non-tagged pointer and leave a
non-tagged pointer in its place after evacuation. There may be more pointers to
the evacuated object and they may be tagged differently, evacuate will simply
preserve the tags on pointers to the same closure -- they may have different
tags.
This is different than threading in compacting GC: it'll find a non-tagged
pointer in roots and chain other references to the same closure (which may be
tagged) to the same thread, and when it updates the thread after moving the
object it'll turn all tagged references into untagged, because the very first
reference it finds is in a root, and pointers from roots are untagged.
Now, it seems like most roots are actually TSOs, which AFAIK are not tagged
even when evaluated (as they can't be entered) so those are fine. But if we tag
Weaks then we have to tag weak_ptr_lists in generations, and then untag them
before using in RTS (e.g. in MarkWeak.c).
Here's another problem: in the reproducer above the untagged pointer is added to the mut_list by scavenge, and in scavenge we don't even know about the tag. I think in the reproducer the tag has to be 1 (because the closure is a FUN), but in general we can't predict the tag as we'd need to know the full type and constructor info. E.g. in a data type like:
data T = A ... | B ... | C ... | D ...
If we scavenge B and add it to a mut_list, what tag should we use? The entry code for B would tag a pointer to itself with 2, but in the RTS we only know that it's a CONSTR, we don't know that it's B, and B should be tagged 2.
I just realized (in !1692 (closed)) that we have a field in info tables for tags of constructors. So perhaps that can be used to tag pointers in mut_lists.
We currently use the convention that a threaded pointer is tagged with
0 if it is the info pointer itself
1 if it points to the end of the chain
2 if it points to another element in the chain
Let's add two more values:
3 like 1, but tag = 0
4 like 2, but tag = 0
in thread(), if we have tag /= 0 and we see 3 or 4, then go to the end of the chain and fix the tag.
untag pointers in code that traverses mut_list and other roots
It should be rare that we hit the case where we have to fix up the tag, but there will be some extra overhead in thread() to comapre tag values, which is unfortunate.
I tried implementing this but unfortunately knowing constructor tags is not enough, we need to know about function tags too, and as far as I can see it's not possible to know tag of a function from its info table.
We discussed this a little bit at the meeting: we actually have arity information of function in StgFunInfoTables so this should still be possible to implement. I'll revisit this.
Tjat sound right to me. I'm not following the details here, but I support the principle that
Given a pointer to a heap object, it should be easy to make a tagged pointer to that object by consulting its info table.
I'm astonished that we can't already do this -- surely we can make it so, if only by simply storing the tag in the info table? (Don't forget function closures, where the tag encodes the arity.)
If we can, then presumably the compacting GHC can use bits for whatever it likes, and restore the tag right at the end.
You'd have to figure out where to put max_tag (I took a quick look at #14373 (closed) but it wasn't obvious if this was resolved). We really really don't want to increase the size of info tables. Perhaps you could steal a few bits from somewhere.
I was hoping to avoid the overhead of tagging pointers when adding them to the remembered set, to keep the overhead of fixing this bug localised to the compacting collector as much as possible. (yes I know there will be a little overhead in untagging mut_list pointers, but it's one mask which is much less than the overhead of looking at the info table and computing a tag)
Ah - my mistake, I thought it was the family size. So the proposed change is to always tag things if the tag fits, even if the constructor comes from a large family.
So yes we could do that. But I'm not sure we want to use that property to fix this bug. We'd be adding a new invariant: that all GC roots have to be tagged pointers. I'm very suspicious of adding new invariants, especially when they involve a performance cost.