GHC optimizes Addr# literals to the same
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE Strict #-}
{-# OPTIONS_GHC -ddump-simpl #-}
import Data.Semigroup
import Data.ByteString.Builder
import Data.ByteString.Internal (accursedUnutterablePerformIO)
import Data.ByteString.Lazy (putStr)
import Data.ByteString.Short (ShortByteString)
import Data.ByteString.Short.Internal (createFromPtr)
import GHC.Exts
data DataA = DataA
{ val1 :: Int,
val2 :: Int
}
deriving (Eq, Show)
toEncodingInt :: Int -> Builder
toEncodingInt = intDec
toEncoding :: DataA -> Builder
toEncoding = let
xn = unsafePackLenLiteral 7 "\"val1\":"#
yn = unsafePackLenLiteral 7 "\"val2\":"#
in \ value_af1X -> case value_af1X of
DataA x y -> "{" <> shortByteString xn <> toEncodingInt x <> "," <> shortByteString yn <> toEncodingInt y <> "}"
unsafePackLenLiteral :: Int -> Addr# -> ShortByteString
unsafePackLenLiteral len addr# =
accursedUnutterablePerformIO $ createFromPtr (Ptr addr#) len
main :: IO ()
main =
Data.ByteString.Lazy.putStr $ toLazyByteString $ toEncoding $ DataA 1 2
When runghc
d or compiled without optimizations, this program prints {"val1":1,"val2":2}
as one would expect.
However when optimized it prints {"val1":1,"val1":2}
(val1
twice!).
Why?
Some observations:
- Removing
{-# LANGUAGE Strict #-}
makes program work as expected (Also making bindings lazy by adding tilde~
in front). - Changing either literal to be of different length makes program work as expected too. (E.g. changing to
yn = unsafePackLenLiteral 8 "\"val2x\":"#
). - Using
unsafePerformIO
instead ofaccursedUnutterablePerformIO
works also.
I'd like to understand whether this is a GHC bug, or I'm doing something wrong, What? Why Strict
triggers wrong behaviour?).
That is important as bytestring
provides unsafePackageLenLiteral
implemented as above, using accursedUnutterablePerformIO
. cc @Bodigrim
Originally reported in https://github.com/haskell/aeson/issues/967