Running an action twice uses much more memory than running it once
EDIT: A detailed analysis of the problems discussed in this ticket can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness does more harm than good". See also #12620 "Allow the user to prevent floating and CSE".
This started as a Haskell cafe discussion about conduit. This may be related to #7206, but I can't be certain. It's possible that GHC is not doing anything wrong here, but I can't see a way that the code in question is misbehaving to trigger this memory usage.
Consider the following code, which depends on conduit-1.1.7 and conduit-extra:
import Data.Conduit ( Sink, (=$), ($$), await ) import qualified Data.Conduit.Binary as CB import System.IO (withBinaryFile, IOMode (ReadMode)) main :: IO () main = do action "random.gz" --action "random.gz" action :: FilePath -> IO () action filePath = withBinaryFile filePath ReadMode $ \h -> do _ <- CB.sourceHandle h $$ CB.lines =$ sink2 1 return () sink2 :: (Monad m) => Int -> Sink a m Int sink2 state = do maybeToken <- await case maybeToken of Nothing -> return state Just _ -> sink2 $! state + 1
The code should open up the file "random.gz" (I simply
gziped about 10MB of data from /dev/urandom), break it into chunks at each newline character, and then count the number of lines. When I run it as-is, it uses 53KB of memory, which seems reasonable.
However, if I uncomment the second call to
main, maximum residency shoots up to 45MB (this seems to be linear in the size of the input file. I additionally tried copying
random.gz into two files,
random2.gz, and changed the two calls to
action to use different file names. It still resulted in large memory usage.
I'm going to continue working to make this a smaller reproducing test case, but I wanted to start with what I had so far. I'll also attach the core generated by both the low-memory and high-memory versions.