Skip to content

Compact API design improvements

I took a look at the compact API today and I realized that there are a lot of improvements we should make to it:

  1. fmap getCompact . compact is a pretty common thing to do, if you don't actually care about the compact pointer. We should make a helper function for this.
  2. The SerializedCompact data structure needs a bunch of instances. Especially, a user who wants to serialize the compact region needs to save this metadata somewhere, but we don't offer any help for doing this.
  3. importCompact will always print a message to stderr saying pointer fixup is happening. Need to be able to suppress this message.
  4. The serialization API is really unsafe; we should make it more safe by default by including some sort of fingerprint, at least.
  5. There should be a convenience function for serializing to and from a file, and serializing to and from a handle. RDMA is always going to be delicate business (so keep the old API around) but for these cases we should pave the cowpaths.

Here is some sample code that might help:

{-# LANGUAGE ScopedTypeVariables #-}
import System.Environment (getArgs)
import qualified Data.Set as Set
import System.IO
import Data.Compact
import Data.Compact.Serialized
import Foreign.Ptr
import Foreign.Storable
import Foreign.Marshal.Alloc
import Control.Monad

main = do
  [dict_file, out_file] <- getArgs
  dict <- readFileLatin1 dict_file
  c <- compact (Set.fromList (words dict))
  withBinaryFile out_file WriteMode $ \h ->
    withSerializedCompact c $ \sc -> do
      -- Write out the metadata header
      hPutStorable h (serializedCompactRoot sc)
      forM_ (serializedCompactBlockList sc) $ \(ptr, l) -> do
        hPutStorable h ptr
        hPutStorable h l
      hPutStorable h nullPtr
      -- Write out the payload
      forM_ (serializedCompactBlockList sc) $ \(ptr, l) ->
        hPutBuf h ptr (fromIntegral l)

  mb_r <- withBinaryFile out_file ReadMode $ \h -> do
    -- Read out the metadata header
    root <- hGetStorable h
    let go h xs = do
            ptr <- hGetStorable h
            if ptr == nullPtr
                then return (reverse xs)
                else do l <- hGetStorable h
                        go h ((ptr, l):xs)
    blocks <- go h []
    let sc = SerializedCompact {
                serializedCompactBlockList = blocks,
                serializedCompactRoot = root
             }
    -- Read the payload into memory
    importCompact sc $ \ptr l -> void $ hGetBuf h ptr (fromIntegral l)

  print (fmap getCompact mb_r == Just (getCompact c))

hPutStorable :: forall a. Storable a => Handle -> a -> IO ()
hPutStorable h a =
    alloca $ \ptr -> do
        poke ptr a
        hPutBuf h ptr (sizeOf (undefined :: a))

hGetStorable :: forall a. Storable a => Handle -> IO a
hGetStorable h =
    alloca $ \ptr -> do
        hGetBuf h ptr (sizeOf (undefined :: a))
        peek ptr

readFileLatin1 f = do
  h <- openFile f ReadMode
  hSetEncoding h latin1
  hGetContents h

I'm happy to do these but I want to make sure I'm not stepping on FB's toes, also I don't know if bgamari wants to take patches along these lines so late in the release cycle.

Trac metadata
Trac field Value
Version 8.0.1
Type FeatureRequest
TypeOfFailure OtherFailure
Priority normal
Resolution Unresolved
Component libraries/compact
Test case
Differential revisions
BlockedBy
Related
Blocking
CC bgamari, simonmar
Operating system
Architecture
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information