Skip to content

LitString, ByteString, FastString and ShortByteString with relevance to compacting

In the Literal data type, LitString is used to represented string literals. In the old days this was just string literals which a user would write in the program, so usually quite short strings. These strings are currently represented by a ByteString.

In recent times @hsyl20 has modified Template Haskell (!141 (merged)) in order to directly construct a LitString from a ByteString. This means that you can now construct very big LitStrings by embedding whole files into a LitString.

Now that we are trying to compact the ModIface (#17097), we have to change the LitString representation to something compactable, for now I changed it to FastString, but this isn't ideal as now @hsyl20 s big files will be loaded into memory rather than just existing as a pointer for the whole compilation pipeline. It is unlikely that any of his big strings would make it into an interface file anyway because they are too big to include in an unfolding.

So the proposal is something like the following:

  • Back a LitString by a FastString, ShortByteString or Text so it can be compacted. There is quite a lot of manipulation of LitString that happens so I went with FastString for now. Perhaps Text would be a better choice.
  • Introduce a new form of literal, LitBytes which is not part of the Literal data type, and must exist only at the top-level. This way, the unfolding can never be exposed for something of LitBytes. This can be backed by a ByteString. This will also mean the LitBytes is not copied into an interface file.
  • Use LitBytes as a target for the TH file embedding stuff rather than LitString.

This will mean that we can compact a Literal, but without having to copy the big LitStrings created from embedding files into memory.

cc @hsyl20

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information