High Performance Boot Packages
I've been working on improving the performance of haddock
, which must depend only on boot packages, since GHC itself depends on it.
I noticed that the JSON representation in haddock
is extremely inefficient, so I tried replacing it with aeson
. Parsing the existing interface JSON files and generating the new JSON index improved time by 76% and allocations by 90%. The whole program runtime improved by 14% and used 40% less total memory.
But we can't bring aeson
in - it has far too many dependencies as-is. I determined that the majority of the performance improvement was from the optimized attoparsec
parser. attoparsec
has minimal dependencies, and only scientific
would need to be accounted for.
I was also surprised to find that there isn't a Vector
type in the boot packages. Seq
is close, but has a lot of overhead. Array
doesn't have a Semigroup
and can't reasonably work for many things. There isn't a type for streaming a la pipes
or conduit
.
I feel like having fast data structures like these available to GHC and boot packages would help a lot for making Haskell faster overall. I'm curious if there's any work that can be done to start incorporating some of these packages in, or to start splitting those packages up into boot
-ready variants.
I also remember feeling pretty confused about the boot library situation generally - is there documentation on what the boot packages are? what the factors and requirements are for including a boot package?