cabal-install/Distribution/Client/IndexUtils.hs · db1ef505ab898d0c6b5cca54bb7e222ff5f4e6ae · Glasgow Haskell Compiler / Packages / Cabal

Refactor & optimise construction of index cache · db1ef505

Herbert Valerio Riedel authored Sep 21, 2016

This commit was motivated by @dcoutts' code-review comment:

> Originally with using the `Sec.directoryEntries` that gave us only the
> final version of each file, ie not all intermediate revisions. And
> previously our strategy was to go through the final versions of each
> file, in file order, and lookup just the ones we're interested in (which
> in practice is 99% of them).
>
> Now for the new cache we want to go through all revisions, which means
> all entries in file order. So instead of using `Sec.directoryEntries`
> which reads from the tar index, we go straight for `Sec.directoryFirst`
> which is block 0 and iterate through, using `lazyUnfold`.
>
> But we can now significantly simplify this and do it more
> efficiently. Note that `indexLookupEntry` and `indexLookupFileEntry` are
> expensive operations that seek in the tar file and read the tar entry at
> that point. So lets do it exactly once per entry. The current code does
> it once in the `lazyUnfold indexLookupEntry` and then again in `mk`. But
> the old `mk` only did that because it had not previously looked up the
> entry.

db1ef505