For a long time we have known that Pretty is less than optimal for code generation purposes as we incur some cost due to its layout functionality (which is unnecessary when, e.g., producing assembler for as).
We should fix this.
One tricky issue is that of the strict length field in the Beside constructor. @mpickering has recently been working on shrinking the FastString representation and consequently removing its cached length. This would be fine except it means that codegen performance regresses since it must walk every FastString we print to compute its length when building a Beside node. This is wasted effort in the case of codegen.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
@dexterleng, the "infinite ribbon case" refers to the rendering configuration where the ribbon width is infinite (that is, we have no desire to insert line breaks that weren't explicitly present in the document). This is the case which is used by GHC code generator, which produces very large documents.
It shouldn't be a problem to work on the the pretty library on GitHub. We should be able to easily port any changes made there to GHC.
Could you point to the part of the code where the length computation takes place?
I'm not sure I follow this question. Which length are you referring to?
Could you point to the part of the code where the length computation takes place?
I'm referring to this:
This would be fine except it means that codegen performance regresses since it must walk every FastString we print to compute its length when building a Beside node. This is wasted effort in the case of codegen.
EDIT: Looking at compiler/cmm it appears functions like ftext are being used, which use the length function.
@dexterleng the code generator uses pretty to emit all of the assembler that we produce. pretty only uses the length of the FastString to inform the layout algorithm. However, when producing assembler there is no reason to perform layout (since the result is going to be consumed by as, not a human).
Looking at compiler/cmm it appears functions like ftext are being used, which use the length function.
Precisely. The Doc type's Beside node always includes its (strictly-evaluated) length.