Running an action twice uses much more memory than running it once
EDIT: A detailed analysis of the problems discussed in this ticket can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness does more harm than good". See also #12620 "Allow the user to prevent floating and CSE".
This started as a Haskell cafe discussion about conduit. This may be related to #7206, but I can't be certain. It's possible that GHC is not doing anything wrong here, but I can't see a way that the code in question is misbehaving to trigger this memory usage.
Consider the following code, which depends on conduit-1.1.7 and conduit-extra:
import Data.Conduit ( Sink, (=$), ($$), await )
import qualified Data.Conduit.Binary as CB
import System.IO (withBinaryFile, IOMode (ReadMode))
main :: IO ()
main = do
action "random.gz"
--action "random.gz"
action :: FilePath -> IO ()
action filePath = withBinaryFile filePath ReadMode $ \h -> do
_ <- CB.sourceHandle h
$$ CB.lines
=$ sink2 1
return ()
sink2 :: (Monad m) => Int -> Sink a m Int
sink2 state = do
maybeToken <- await
case maybeToken of
Nothing -> return state
Just _ -> sink2 $! state + 1
The code should open up the file "random.gz" (I simply gziped about 10MB of data from /dev/urandom), break it into chunks at each newline character, and then count the number of lines. When I run it as-is, it uses 53KB of memory, which seems reasonable.
However, if I uncomment the second call to action in main, maximum residency shoots up to 45MB (this seems to be linear in the size of the input file. I additionally tried copying random.gz into two files, random1.gz and random2.gz, and changed the two calls to action to use different file names. It still resulted in large memory usage.
I'm going to continue working to make this a smaller reproducing test case, but I wanted to start with what I had so far. I'll also attach the core generated by both the low-memory and high-memory versions.