GHC optimizes Addr# literals to the same
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Strict #-}
{-# OPTIONS_GHC -ddump-simpl #-}
import Data.Semigroup
import Data.ByteString.Builder
import Data.ByteString.Internal (accursedUnutterablePerformIO)
import Data.ByteString.Lazy (putStr)
import Data.ByteString.Short (ShortByteString)
import Data.ByteString.Short.Internal (createFromPtr)
import GHC.Exts
data DataA = DataA
{ val1 :: Int,
val2 :: Int
}
deriving (Eq, Show)
toEncodingInt :: Int -> Builder
toEncodingInt = intDec
toEncoding :: DataA -> Builder
toEncoding = let
xn = unsafePackLenLiteral 7 "\"val1\":"#
yn = unsafePackLenLiteral 7 "\"val2\":"#
in \ value_af1X -> case value_af1X of
DataA x y -> "{" <> shortByteString xn <> toEncodingInt x <> "," <> shortByteString yn <> toEncodingInt y <> "}"
unsafePackLenLiteral :: Int -> Addr# -> ShortByteString
unsafePackLenLiteral len addr# =
accursedUnutterablePerformIO $ createFromPtr (Ptr addr#) len
main :: IO ()
main =
Data.ByteString.Lazy.putStr $ toLazyByteString $ toEncoding $ DataA 1 2
When runghcd or compiled without optimizations, this program prints {"val1":1,"val2":2} as one would expect.
However when optimized it prints {"val1":1,"val1":2} (val1 twice!).
Why?
Some observations:
- Removing
{-# LANGUAGE Strict #-}makes program work as expected (Also making bindings lazy by adding tilde~in front). - Changing either literal to be of different length makes program work as expected too. (E.g. changing to
yn = unsafePackLenLiteral 8 "\"val2x\":"#). - Using
unsafePerformIOinstead ofaccursedUnutterablePerformIOworks also.
I'd like to understand whether this is a GHC bug, or I'm doing something wrong, What? Why Strict triggers wrong behaviour?).
That is important as bytestring provides unsafePackageLenLiteral implemented as above, using accursedUnutterablePerformIO. cc @Bodigrim
Originally reported in https://github.com/haskell/aeson/issues/967