Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,866
    • Issues 4,866
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 458
    • Merge requests 458
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #9494
Closed
Open
Created Aug 21, 2014 by nominolo@gmail.com@trac-nominolo

Probable data corruption with GHCi 7.8.* and Zlib

The following program causes Zlib data corruption errors when run from inside GHCi. It launches two threads which then concurrently read a file, compress it, and immediately decompress it. You need libraries zlib, SHA, and async.

module Main where

import qualified Codec.Compression.Zlib   as Zlib
import qualified Data.ByteString.Lazy     as BL
import qualified Data.ByteString.Internal as BI

import Control.Exception        (bracket)
import Control.Concurrent
import Control.Monad
import Control.Exception        ( evaluate)
import Data.Digest.Pure.SHA     ( sha1)  -- from the 'SHA' package
import Control.Concurrent.Async ( mapConcurrently)
import System.Mem               ( performGC )

import Debug.Trace

test :: Int -> IO String
test _ = do
    tid <- myThreadId

    -- testdata is: dd if=/dev/urandom of=/tmp/testdata bs=100k count=100
    -- Could also be replaced by: (BL.take (10^7) "/dev/urandom")
    dat <- BL.readFile "/tmp/testdata"

    let cbuf = Zlib.compress $ traceChunks tid $ dat
    s <- evaluate $ sha1 $ Zlib.decompress $ cbuf
    return $ show s
  where
    -- We used this to check whether buffers were reused by different threads, but that
    -- doesn't seem to be the case. Removing the call to traceChunks, however, makes it
    -- harder to reproduce possibly because of scheduler effects. In a much larger program
    -- it could be reproduced more easily without the trace, but in this small example
    -- tracing seems to cause the right amount of nondeterminism.
    traceChunks tid bs =
        BL.fromChunks
      $ zipWith (\n x -> trace (show tid ++ ":" ++ showBS x) x) [1..]
      $ BL.toChunks bs

    showBS (BI.PS ptr off len) = show ptr


main = do
    r <- withGCThread $ mapConcurrently (test) ([1..2] :: [Int])
    putStrLn $ show $ r
  where
    -- Regularly forcing the GC makes the test-case more reproducible.
    withGCThread io =
        bracket (forkIO $ forever $ performGC >> threadDelay 1000)
                killThread
                (const io)

The output should be something like:

...
ThreadId 51:0x00000001091ee010
ThreadId 49:0x00000001091a7010
...
ThreadId 49:0x000000010986f010
zlib-test-case.hs: user error (Codec.Compression.Zlib: incorrect data check)

You'll get different Zlib errors, depending on where it detects the inconsistency. Sometimes Zlib doesn't throw an error, but the checksums are different.

So far we've only been able to reproduce this using GHCi 7.8.3 on both Linux (NixOS) and Mac. We haven't been able to trigger it with a compiled executable, nor with GHCi 7.6.3. It was reproducable with HEAD from Jan 30 (I had that lying around somewhere).

Trac metadata
Trac field Value
Version 7.8.3
Type Bug
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component GHCi
Test case
Differential revisions
BlockedBy
Related
Blocking
CC hvr
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking