`BufferCodec`s have unnecessary allocations
(WIP code in !9948 (closed))
The GHC.IO.Encoding.Types module contains:
type CodeBuffer from to = Buffer from -> Buffer to -> IO (CodeProgress, Buffer from, Buffer to)
data BufferCodec from to state = BufferCodec {
encode :: CodeBuffer from to,
recover :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to),
...
}
Given a simple test program to benchmark this:
{-# LANGUAGE MagicHash #-}
{-# OPTIONS_GHC -dno-typeable-binds -O2 #-}
module Main (main) where
import System.IO
import Data.Bits
import GHC.Int
import GHC.Exts
import System.Environment
main :: IO ()
main = do
[n] <- getArgs
withFile "/dev/null" WriteMode (loop (read n))
loop :: Int -> Handle -> IO ()
loop 0 !_ = pure ()
loop !n !h = do
hPutChar h $! dummy_char n
loop (n-1) h
-- unsafe efficient version of `chr`
my_chr :: Int -> Char
my_chr (I# i) = C# (chr# i)
-- return either a or b
dummy_char :: Int -> Char
dummy_char !i = my_chr ((i .&. 1) + 97)
We can extract a ticky profile of this to give us allocation statistics about UTF8 encoding:
Entries Alloc Alloc'd Non-void Arguments STG Name
--------------------------------------------------------------------
100001 14400144 0 8 PMEiwiiS GHC.IO.Encoding.UTF8.$wutf8_encode{v rh} (fun)
100001 9600096 0 2 SS GHC.IO.Encoding.UTF8.mkUTF1{v rn} (fun)
100000 6400000 0 2 cS $l$wact1_g5R1{v} (GHC.IO.Handle.Text) (fun)
100002 4800096 0 4 LM>P GHC.IO.Handle.Internals.$wdo_operation{v rR} (fun)
100002 4000080 0 4 LMP> GHC.IO.Handle.Internals.$wwantWritableHandle'{v r1N} (fun)
1700034 2400048 0 1 S sat_s5F0{v} (GHC.IO.Handle.Internals) (fun) in s5DP
100000 2400000 0 4 >>SS GHC.IO.Handle.Internals.$wstreamEncode{v r1c} (fun)
4100082 1600032 0 0 io{v s5DP} (GHC.IO.Handle.Internals) (fun) in r1N
100000 1600000 0 2 Mc GHC.IO.Handle.Text.$whPutChar{v rL} (fun)
We can also extract heap statistics when running the program:
./HandlePerf 100 +RTS -s
140,952 bytes allocated in the heap
./HandlePerf 1000 +RTS -s
796,440 bytes allocated in the heap
./HandlePerf 10000 +RTS -s
7,348,944 bytes allocated in the heap
./HandlePerf 100000 +RTS -s
72,871,608 bytes allocated in the heap
A lot of these allocations are due to the use of boxed tuples in recover
and decode
- since every call to these will allocate a tuple for the result. We could instead define these as:
type CodeBuffer from to = Buffer from -> Buffer to -> IO (CodeProgress, Buffer from, Buffer to)
type CodeBuffer# from to = Buffer from -> Buffer to -> State# RealWorld -> (# State# RealWorld, CodeProgress, Buffer from, Buffer to #)
data BufferCodec from to state = BufferCodec {
encode# :: CodeBuffer# from to,
recover# :: Buffer from -> Buffer to -> State# RealWorld -> (# State# RealWorld, Buffer from, Buffer to #),
...
}
{-# INLINE encode #-}
encode :: BufferCodec from to state -> CodeBuffer from to
encode codec from to = IO $ \s -> case encode# codec from to s of
(# s', progress, from', to' #) -> (# s', (progress, from', to') #)
{-# INLINE recover #-}
recover :: BufferCoder from to state -> Buffer from -> Buffer to -> (Buffer from, Buffer to)
recover codec from to = IO $ \s -> case recover# codec from to s of
(# s', from', to' #) -> (# s', (from', to') #)
Then, if we propagate these changes to the encoders/decoders, such as GHC.IO.Encoding.UTF8.utf8_encode
, we get the following allocation numbers:
Entries Alloc Alloc'd Non-void Arguments STG Name
--------------------------------------------------------------------
100001 14400144 0 8 PMEiwiiS GHC.IO.Encoding.UTF8.$wutf8_encode{v r2mD} (fun)
100001 6400064 0 2 SS GHC.IO.Encoding.UTF8.utf8_encode{v rC3} (fun)
100000 6400000 0 2 cS $l$wact1_g5QY{v} (GHC.IO.Handle.Text) (fun)
100002 4800096 0 4 LM>P GHC.IO.Handle.Internals.$wdo_operation{v r4Zo} (fun)
100002 4000080 0 4 LMP> GHC.IO.Handle.Internals.$wwantWritableHandle'{v r4Zy} (fun)
1700034 2400048 0 1 S sat_s5Fh{v} (GHC.IO.Handle.Internals) (fun) in s5E6
100000 2400000 0 4 >>SS GHC.IO.Handle.Internals.$wstreamEncode{v r4Yn} (fun)
4100082 1600032 0 0 io{v s5E6} (GHC.IO.Handle.Internals) (fun) in r4Zy
100000 1600000 0 2 Mc GHC.IO.Handle.Text.$whPutChar{v r51K} (fun)
./HandlePerf 100 +RTS -s
106,792 bytes allocated in the heap
./HandlePerf 1000 +RTS -s
459,896 bytes allocated in the heap
./HandlePerf 10000 +RTS -s
3,988,416 bytes allocated in the heap
./HandlePerf 100000 +RTS -s
39,271,096 bytes allocated in the heap