Avoid unnecessary code duplication from String Literals.
When we compile a function like this:
foo :: Int -> String
foo 1 = "1_String"
foo 2 = "2_String"
foo 3 = "3_String"
foo 4 = "4_String"
foo 5 = "5_String"
foo _ = "unknown_string"
We get a few things.
Obviously there are the actual strings:
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
Foo.foo10 :: GHC.Prim.Addr#
Foo.foo10 = "1_String"#
There is no way around these so that's fine.
Then we end up with a function to unpack the actual string.
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
Foo.foo9 :: [Char]
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=True,
WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 20 0}]
Foo.foo9 = GHC.CString.unpackCString# Foo.foo10
We want to have a closure for each of these. Otherwise we end up constructing a new string each time, which would be a waste. So on the surface this is fine. However we end up with one of these for each string.
Looking at the Cmm/Asm the functions are not THAT small.
[Foo.foo9_entry() // [R1]
{ info_tbls: [(c2hj,
label: Foo.foo9_info
rep: HeapRep static { Thunk }
srt: Nothing)]
stack_info: arg_space: 8 updfr_space: Just 8
}
{offset
c2hj: // global
if ((Sp + -16) < SpLim) (likely: False) goto c2hk; else goto c2hl;
c2hk: // global
R1 = R1;
call (stg_gc_enter_1)(R1) args: 8, res: 0, upd: 8;
c2hl: // global
(_c2hg::I64) = call "ccall" arg hints: [PtrHint,
PtrHint] result hints: [PtrHint] newCAF(BaseReg, R1);
if (_c2hg::I64 == 0) goto c2hi; else goto c2hh;
c2hi: // global
call (I64[R1])() args: 8, res: 0, upd: 8;
c2hh: // global
I64[Sp - 16] = stg_bh_upd_frame_info;
I64[Sp - 8] = _c2hg::I64;
R2 = Foo.foo10_bytes;
Sp = Sp - 16;
call GHC.CString.unpackCString#_info(R2) args: 24, res: 0, upd: 24;
}
}
In actual code size (HEAD/-O2) the code + info table is about 100Byte.
To give an idea in my current ghc-stage2 I have 12,409 calls to unpackCString, and I assume 99% of them are from actual strings. So that comes out to about one megabyte. Out of a binary size of 84MB, so surprisingly significant actually!
Now what can we do about this!
I think the right way to go here is to define a unpackString
function which does the actual work.
Then we jump into that from a small per-string wrapper.
So we would end up with code like:
[Foo.foo9_entry() // [R1]
c2hj:
// R1 = Closure pointer
R2 = Foo.foo10_bytes;
call (unpackString)(R1,R2) args: 8, res: 0, upd: 8;
}
Which should be well under 50 bytes. It does cause an extra indirection when unpacking the String, but if the performance of that matters using String is likely the wrong choice to begin with.
How to actually get GHC to generate code like that I'm not sure yet.
Trac metadata
Trac field | Value |
---|---|
Version | 8.6.2 |
Type | Task |
TypeOfFailure | OtherFailure |
Priority | normal |
Resolution | Unresolved |
Component | Compiler |
Test case | |
Differential revisions | |
BlockedBy | |
Related | |
Blocking | |
CC | |
Operating system | |
Architecture |