LitString, ByteString, FastString and ShortByteString with relevance to compacting
In the Literal
data type, LitString
is used to represented string literals. In the old days this was just string literals which a user would write in the program, so usually quite short strings. These strings are currently
represented by a ByteString
.
In recent times @hsyl20 has modified Template Haskell (!141 (merged)) in order to directly construct a LitString
from a ByteString
. This means that you can now construct very big LitString
s by embedding whole files into a LitString
.
Now that we are trying to compact the ModIface
(#17097), we have to change the LitString
representation to something compactable, for now I changed it to FastString
, but this isn't ideal as now @hsyl20 s big files will be loaded into memory rather than just existing as a pointer for the whole compilation pipeline. It is unlikely that any of his big strings would make it into
an interface file anyway because they are too big to include in an unfolding.
So the proposal is something like the following:
- Back a
LitString
by aFastString
,ShortByteString
orText
so it can be compacted. There is quite a lot of manipulation ofLitString
that happens so I went withFastString
for now. PerhapsText
would be a better choice. - Introduce a new form of literal,
LitBytes
which is not part of theLiteral
data type, and must exist only at the top-level. This way, the unfolding can never be exposed for something ofLitBytes
. This can be backed by aByteString
. This will also mean theLitBytes
is not copied into an interface file. - Use
LitBytes
as a target for the TH file embedding stuff rather thanLitString
.
This will mean that we can compact a Literal
, but without having to copy the big LitString
s created from embedding files into memory.
cc @hsyl20