LitString, ByteString, FastString and ShortByteString with relevance to compacting
In the Literal data type, LitString is used to represented string literals. In the old days this was just string literals which a user would write in the program, so usually quite short strings. These strings are currently
represented by a ByteString.
In recent times @hsyl20 has modified Template Haskell (!141 (merged)) in order to directly construct a LitString from a ByteString. This means that you can now construct very big LitStrings by embedding whole files into a LitString.
Now that we are trying to compact the ModIface (#17097), we have to change the LitString representation to something compactable, for now I changed it to FastString, but this isn't ideal as now @hsyl20 s big files will be loaded into memory rather than just existing as a pointer for the whole compilation pipeline. It is unlikely that any of his big strings would make it into
an interface file anyway because they are too big to include in an unfolding.
So the proposal is something like the following:
- Back a
LitStringby aFastString,ShortByteStringorTextso it can be compacted. There is quite a lot of manipulation ofLitStringthat happens so I went withFastStringfor now. PerhapsTextwould be a better choice. - Introduce a new form of literal,
LitByteswhich is not part of theLiteraldata type, and must exist only at the top-level. This way, the unfolding can never be exposed for something ofLitBytes. This can be backed by aByteString. This will also mean theLitBytesis not copied into an interface file. - Use
LitBytesas a target for the TH file embedding stuff rather thanLitString.
This will mean that we can compact a Literal, but without having to copy the big LitStrings created from embedding files into memory.
cc @hsyl20