Commit e0dd63cc authored by Herbert Valerio Riedel's avatar Herbert Valerio Riedel 🕺 Committed by GitHub
Browse files

Use hash-consing to optimise index cache (#3897)

Without this optimisation, `cabal info somethingnonexisting` results in

     960,397,120 bytes allocated in the heap
     739,652,560 bytes copied during GC
      67,757,128 bytes maximum residency (24 sample(s))
       2,234,096 bytes maximum slop
             147 MB total memory in use (0 MB lost due to fragmentation)

with this optimisation:

   1,000,825,744 bytes allocated in the heap
     656,112,432 bytes copied during GC
      44,476,616 bytes maximum residency (24 sample(s))
       2,302,864 bytes maximum slop
             109 MB total memory in use (0 MB lost due to fragmentation)

So the total memory in use is significantly lower. The total runtime is
also slightly reduced, from

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.683s  (  1.050s elapsed)
  GC      time    0.946s  (  0.946s elapsed)
  EXIT    time    0.005s  (  0.005s elapsed)
  Total   time    1.637s  (  2.002s elapsed)

to

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.664s  (  0.988s elapsed)
  GC      time    0.797s  (  0.797s elapsed)
  EXIT    time    0.004s  (  0.004s elapsed)
  Total   time    1.467s  (  1.789s elapsed)


Note that there's currently ~80k cache entries, but only ~10k unique package names
and ~6k unique versions. So hash-consing helps reduce the amount of heap objects 
for both value types by one order of magnitude, which among other benefits also
reduces GC overhead.
parent ecd2cb1f
...@@ -730,9 +730,9 @@ readIndexCache verbosity index = do ...@@ -730,9 +730,9 @@ readIndexCache verbosity index = do
updatePackageIndexCacheFile verbosity index updatePackageIndexCacheFile verbosity index
either die return =<< readIndexCache' index either die (return . hashConsCache) =<< readIndexCache' index
Right res -> return res Right res -> return (hashConsCache res)
-- | Read the 'Index' cache from the filesystem without attempting to -- | Read the 'Index' cache from the filesystem without attempting to
-- regenerate on parsing failures. -- regenerate on parsing failures.
...@@ -748,6 +748,37 @@ writeIndexCache index cache ...@@ -748,6 +748,37 @@ writeIndexCache index cache
| is01Index index = encodeFile (cacheFile index) cache | is01Index index = encodeFile (cacheFile index) cache
| otherwise = writeFile (cacheFile index) (show00IndexCache cache) | otherwise = writeFile (cacheFile index) (show00IndexCache cache)
-- | Optimise sharing of equal values inside 'Cache'
--
-- c.f. https://en.wikipedia.org/wiki/Hash_consing
hashConsCache :: Cache -> Cache
hashConsCache cache0
= cache0 { cacheEntries = go mempty mempty (cacheEntries cache0) }
where
-- TODO/NOTE:
--
-- If/when we redo the binary serialisation via e.g. CBOR and we
-- are able to use incremental decoding, we may want to move the
-- hash-consing into the incremental deserialisation, or
-- alterantively even do something like
-- http://cbor.schmorp.de/value-sharing
--
go _ _ [] = []
-- for now we only optimise only CachePackageIds since those
-- represent the vast majority
go !pns !pvs (CachePackageId pid bno ts : rest)
= CachePackageId pid' bno ts : go pns' pvs' rest
where
!pid' = PackageIdentifier pn' pv'
(!pn',!pns') = mapIntern pn pns
(!pv',!pvs') = mapIntern pv pvs
PackageIdentifier pn pv = pid
go pns pvs (x:xs) = x : go pns pvs xs
mapIntern :: Ord k => k -> Map.Map k k -> (k,Map.Map k k)
mapIntern k m = maybe (k,Map.insert k k m) (\k' -> (k',m)) (Map.lookup k m)
-- | Cabal caches various information about the Hackage index -- | Cabal caches various information about the Hackage index
data Cache = Cache data Cache = Cache
{ cacheHeadTs :: Timestamp { cacheHeadTs :: Timestamp
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment