Skip to content

High Performance Boot Packages

I've been working on improving the performance of haddock, which must depend only on boot packages, since GHC itself depends on it.

I noticed that the JSON representation in haddock is extremely inefficient, so I tried replacing it with aeson. Parsing the existing interface JSON files and generating the new JSON index improved time by 76% and allocations by 90%. The whole program runtime improved by 14% and used 40% less total memory.

But we can't bring aeson in - it has far too many dependencies as-is. I determined that the majority of the performance improvement was from the optimized attoparsec parser. attoparsec has minimal dependencies, and only scientific would need to be accounted for.

I was also surprised to find that there isn't a Vector type in the boot packages. Seq is close, but has a lot of overhead. Array doesn't have a Semigroup and can't reasonably work for many things. There isn't a type for streaming a la pipes or conduit.

I feel like having fast data structures like these available to GHC and boot packages would help a lot for making Haskell faster overall. I'm curious if there's any work that can be done to start incorporating some of these packages in, or to start splitting those packages up into boot-ready variants.

I also remember feeling pretty confused about the boot library situation generally - is there documentation on what the boot packages are? what the factors and requirements are for including a boot package?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information