TBQueue leaks space under certain workloads
I'm using TBQueue and I noticed suspiciously high memory usage, so I decided to profile and it turned out that readTBQueue leaks space (see attached before.png).
After closer inspection it turned out it's the
writeTVar rsize (r + 1) in readTBQueue definition that's the problem - after substitution it for
writeTVar rsize $! r + 1 the leak is gone (see attached after.png)
Here are -s outputs:
366,535,518,024 bytes allocated in the heap 115,643,281,224 bytes copied during GC 241,356,416 bytes maximum residency (1182 sample(s)) 1,516,944 bytes maximum slop 392 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 247273 colls, 247273 par 128.854s 28.654s 0.0001s 0.0182s Gen 1 1182 colls, 1181 par 352.162s 87.812s 0.0743s 0.1322s Parallel GC work balance: 78.17% (serial 0%, perfect 100%) TASKS: 24 (1 bound, 16 peak workers (23 total), using -N4) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.003s ( 0.003s elapsed) MUT time 581.754s (226.191s elapsed) GC time 317.130s ( 75.533s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 163.885s ( 40.933s elapsed) EXIT time 0.013s ( 0.011s elapsed) Total time 1062.789s (301.738s elapsed) Alloc rate 630,052,684 bytes per MUT second Productivity 54.7% of total user, 61.4% of total elapsed gc_alloc_block_sync: 8998531 whitehole_spin: 96 gen.sync: 180553 gen.sync: 31648044
431,671,260,464 bytes allocated in the heap 86,540,207,400 bytes copied during GC 170,338,336 bytes maximum residency (1381 sample(s)) 1,159,472 bytes maximum slop 260 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 290179 colls, 290179 par 148.921s 33.097s 0.0001s 0.0217s Gen 1 1381 colls, 1380 par 206.679s 51.492s 0.0373s 0.0528s Parallel GC work balance: 75.51% (serial 0%, perfect 100%) TASKS: 23 (1 bound, 17 peak workers (22 total), using -N4) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.005s ( 0.004s elapsed) MUT time 681.718s (241.009s elapsed) GC time 258.643s ( 60.370s elapsed) RP time 0.000s ( 0.000s elapsed) PROF time 96.957s ( 24.219s elapsed) EXIT time 0.009s ( 0.007s elapsed) Total time 1037.335s (301.390s elapsed) Alloc rate 633,210,748 bytes per MUT second Productivity 65.7% of total user, 71.9% of total elapsed gc_alloc_block_sync: 5494680 whitehole_spin: 184 gen.sync: 184109 gen.sync: 24223953
Attached patch fixes the problem (I made all Int increments/decrements in the module strict as there is no need for them to be lazy).