Quadratic GC slowdown using RTS debug
Under some conditions of heavy heap usage, I seem to experience a quadratic slowdown due to GC using the debugging RTS only (for example, in the context of ticky profiling).
I have been able to reproduce it using the following program (depending on the aeson library):
import Data.Aeson (eitherDecode, Value) import qualified Data.ByteString.Lazy as BL import Data.Maybe (catMaybes) import System.Directory (listDirectory) import System.FilePath ( (</>) ) main :: IO() main = do let dir = "data" files <- listDirectory dir values <- catMaybes <$> mapM (readF dir) files print $ length values readF :: FilePath -> FilePath -> IO (Maybe Value) readF dir fp = do print (dir </> fp) blob <- BL.readFile (dir </> fp) case eitherDecode blob of Left _ -> return Nothing Right v -> return $ Just $! v
Here data is a directory filled with (identical) JSON files. See the attached Python script for the proposed dataset.
$ ghc -O -dynamic -o main Main.hs $ ./main +RTS -s ... 18,583,045,600 bytes allocated in the heap 6,334,338,280 bytes copied during GC 975,763,072 bytes maximum residency (14 sample(s)) 6,986,112 bytes maximum slop 930 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 17727 colls, 0 par 2.596s 2.597s 0.0001s 0.0006s Gen 1 14 colls, 0 par 2.422s 2.422s 0.1730s 0.7947s INIT time 0.000s ( 0.000s elapsed) MUT time 3.349s ( 3.351s elapsed) GC time 5.019s ( 5.020s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 8.368s ( 8.371s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 5,548,968,539 bytes per MUT second Productivity 40.0% of total user, 40.0% of total elapsed
$ ghc -O -dynamic -debug -o main.debug Main.hs $ ./main +RTS -s ... 18,583,045,600 bytes allocated in the heap 6,334,332,424 bytes copied during GC 975,763,072 bytes maximum residency (14 sample(s)) 6,986,112 bytes maximum slop 930 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 17727 colls, 0 par 172.941s 172.959s 0.0098s 0.0459s Gen 1 14 colls, 0 par 9.532s 9.532s 0.6808s 3.2061s INIT time 0.000s ( 0.000s elapsed) MUT time 5.020s ( 5.017s elapsed) GC time 182.473s (182.491s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 187.492s (187.508s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 3,701,999,827 bytes per MUT second Productivity 2.7% of total user, 2.7% of total elapsed
In debug mode, the runtime is almost quadratic:
Nb of files Duration (non-debug) Duration (debug) 100 0.8s 2.8s 200 1.6s 8.8s 400 3.2s 27.6s 1000 8.2s 168.9s
The problem can be seen with either GHC 8.6.3 (here on Archlinux, with aeson 18.104.22.168) but was also observed using GHC 8.4.4, with static binaries, using "stack". I was able to reproduce with both threaded and non-threaded mode.