FastString Ponderings
I compiled Cabal with some instrumentation about FastStrings.
A FastString provides two benefits
- Fast equality comparison between strings
- A cached z-encoded string
The cost is that you can never deallocate a fast string once it has been placed in the FastString table, which is an issue for applications like ghcide as it results in a slow unavoidable leak.
So I set out to investigate the necessity of fast string comparision and the z-encoding caching.
Equality
Number | Description |
---|---|
102184 | The total number of strings in the FastString table |
6647 | FastStrings which are ever compared for equality to other FastStrings |
485367 | Total number of comparisions |
Only 6.5% of FastStrings in the FastString table are ever used for equality comparison, and there are not very many comparisons in total. The most common comparisons are:
7134 "libraries/Cabal/Cabal/src/Distribution/Simple/BuildTarget.hs"
8624 "libraries/Cabal/Cabal/src/Distribution/Simple/Utils.hs"
9068 "~R#"
10306 "libraries/Cabal/Cabal/src/Distribution/SPDX/LicenseId.hs"
10460 "libraries/Cabal/Cabal/src/Distribution/Simple/GHCJS.hs"
10521 "fromRational"
10521 "mkRationalBase10"
10521 "mkRationalBase2"
10529 "fromInteger"
10581 "fromString"
11767 "="
11794 "libraries/Cabal/Cabal/src/Distribution/Simple/Configure.hs"
12238 "libraries/Cabal/Cabal/src/Distribution/Simple/GHC.hs"
15378 "libraries/Cabal/Cabal/src/Distribution/Simple/Setup.hs"
15832 "libraries/Cabal/Cabal/src/Distribution/PackageDescription/Check.hs"
21492 "->"
42441 "."
Analysis: It might not be necessary to have a fast comparison operation if it is called so few times. Lexiographic comparison may be fast enough.
Z-Encoding
Number | Description |
---|---|
102184 | The total number of strings in the FastString table |
32% | The percentage of these strings where the z-encoding is used |
1077903 | Total number of Z-Encodings requested |
647248 | Hits for the top 200 cached strings |
781921 | Hits for the top 1000 cached strings |
Idea: Keep a LRU cache for z-encoding which only stores 1-2000 entries. This cache can be cleared, unlike the FastStringTable and doesn't even need to be created if we are not doing code generation.
Caching
Number | Description |
---|---|
102184 | The total number of strings in the FastString table |
1105472 | Number of attempted creation of faststrings |
The largest number of these come from tidying (tidyOccName
), which iterates through names trying to fix the next free one.
2680 "ds22"
2687 "ds21"
2698 "ds20"
2720 "ds25"
2771 "ww5"
2803 "$krep1"
2873 "ds18"
2887 "ds17"
2941 "ds16"
3008 "ds19"
3034 "ds15"
3153 "wild4"
3199 "ds12"
3217 "ds13"
3247 "ds2"
3282 "ds14"
3403 "ds11"
3450 "ww4"
3811 "::"
3838 "ds10"
3914 "wild3"
4070 "ww3"
4245 "ds9"
4472 "ds4"
4541 "dt1"
4569 "ds8"
4736 "x1"
5077 "ds5"
5222 "ww2"
5223 "ds7"
5300 "eta1"
5376 "wild2"
5464 "ds6"
5497 "ds3"
6107 "import"
6933 "a1"
8041 "->"
9304 "ipv1"
11782 "="
19156 "lvl1"
38996 "wild1"
53193 "ww1"
194703 "ds1"
Conclusion: It is not clear if memory usage will increase if the sharing is lost by removing the FastStringTable.