Running an action twice uses much more memory than running it once
EDIT: A detailed analysis of the problems discussed in this ticket can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness does more harm than good". See also #12620 "Allow the user to prevent floating and CSE".
This started as a Haskell cafe discussion about conduit. This may be related to #7206, but I can't be certain. It's possible that GHC is not doing anything wrong here, but I can't see a way that the code in question is misbehaving to trigger this memory usage.
Consider the following code, which depends on conduit-1.1.7 and conduit-extra:
import Data.Conduit ( Sink, (=$), ($$), await )
import qualified Data.Conduit.Binary as CB
import System.IO (withBinaryFile, IOMode (ReadMode))
main :: IO ()
main = do
action "random.gz"
--action "random.gz"
action :: FilePath -> IO ()
action filePath = withBinaryFile filePath ReadMode $ \h -> do
_ <- CB.sourceHandle h
$$ CB.lines
=$ sink2 1
return ()
sink2 :: (Monad m) => Int -> Sink a m Int
sink2 state = do
maybeToken <- await
case maybeToken of
Nothing -> return state
Just _ -> sink2 $! state + 1
The code should open up the file "random.gz" (I simply gzip
ed about 10MB of data from /dev/urandom), break it into chunks at each newline character, and then count the number of lines. When I run it as-is, it uses 53KB of memory, which seems reasonable.
However, if I uncomment the second call to action
in main
, maximum residency shoots up to 45MB (this seems to be linear in the size of the input file. I additionally tried copying random.gz
into two files, random1.gz
and random2.gz
, and changed the two calls to action
to use different file names. It still resulted in large memory usage.
I'm going to continue working to make this a smaller reproducing test case, but I wanted to start with what I had so far. I'll also attach the core generated by both the low-memory and high-memory versions.