Skip to content

FastString Ponderings

I compiled Cabal with some instrumentation about FastStrings.

A FastString provides two benefits

  1. Fast equality comparison between strings
  2. A cached z-encoded string

The cost is that you can never deallocate a fast string once it has been placed in the FastString table, which is an issue for applications like ghcide as it results in a slow unavoidable leak.

So I set out to investigate the necessity of fast string comparision and the z-encoding caching.

Equality

Number Description
102184 The total number of strings in the FastString table
6647 FastStrings which are ever compared for equality to other FastStrings
485367 Total number of comparisions

Only 6.5% of FastStrings in the FastString table are ever used for equality comparison, and there are not very many comparisons in total. The most common comparisons are:

   7134 "libraries/Cabal/Cabal/src/Distribution/Simple/BuildTarget.hs"
   8624 "libraries/Cabal/Cabal/src/Distribution/Simple/Utils.hs"
   9068 "~R#"
  10306 "libraries/Cabal/Cabal/src/Distribution/SPDX/LicenseId.hs"
  10460 "libraries/Cabal/Cabal/src/Distribution/Simple/GHCJS.hs"
  10521 "fromRational"
  10521 "mkRationalBase10"
  10521 "mkRationalBase2"
  10529 "fromInteger"
  10581 "fromString"
  11767 "="
  11794 "libraries/Cabal/Cabal/src/Distribution/Simple/Configure.hs"
  12238 "libraries/Cabal/Cabal/src/Distribution/Simple/GHC.hs"
  15378 "libraries/Cabal/Cabal/src/Distribution/Simple/Setup.hs"
  15832 "libraries/Cabal/Cabal/src/Distribution/PackageDescription/Check.hs"
  21492 "->"
  42441 "."

Analysis: It might not be necessary to have a fast comparison operation if it is called so few times. Lexiographic comparison may be fast enough.

Z-Encoding

Number Description
102184 The total number of strings in the FastString table
32% The percentage of these strings where the z-encoding is used
1077903 Total number of Z-Encodings requested
647248 Hits for the top 200 cached strings
781921 Hits for the top 1000 cached strings

Idea: Keep a LRU cache for z-encoding which only stores 1-2000 entries. This cache can be cleared, unlike the FastStringTable and doesn't even need to be created if we are not doing code generation.

Caching

Number Description
102184 The total number of strings in the FastString table
1105472 Number of attempted creation of faststrings

The largest number of these come from tidying (tidyOccName), which iterates through names trying to fix the next free one.

   2680 "ds22"
   2687 "ds21"
   2698 "ds20"
   2720 "ds25"
   2771 "ww5"
   2803 "$krep1"
   2873 "ds18"
   2887 "ds17"
   2941 "ds16"
   3008 "ds19"
   3034 "ds15"
   3153 "wild4"
   3199 "ds12"
   3217 "ds13"
   3247 "ds2"
   3282 "ds14"
   3403 "ds11"
   3450 "ww4"
   3811 "::"
   3838 "ds10"
   3914 "wild3"
   4070 "ww3"
   4245 "ds9"
   4472 "ds4"
   4541 "dt1"
   4569 "ds8"
   4736 "x1"
   5077 "ds5"
   5222 "ww2"
   5223 "ds7"
   5300 "eta1"
   5376 "wild2"
   5464 "ds6"
   5497 "ds3"
   6107 "import"
   6933 "a1"
   8041 "->"
   9304 "ipv1"
  11782 "="
  19156 "lvl1"
  38996 "wild1"
  53193 "ww1"
 194703 "ds1"

Conclusion: It is not clear if memory usage will increase if the sharing is lost by removing the FastStringTable.

Edited by Matthew Pickering
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information