Reasoning on unicode static data compression at GHC-side
Summary
We (much thanks @bgamari for core implementation and @mpickering for my part review) eliminated dependency on unicode for trivial hello-world
code. It reduced the bundle size for javascript from ~3.2mb
(after google closure compiler) to 0.4kb
(after google closure compiler) as it was predicted here. Sure, it is not the end, so, the next large target is to keep size small even when unicode is actually needed.
This issue launches a discussion on subject how to deal with the large unicode size at supported platforms. The discussion how to de-bundle unicode support for platforms where unicode is actually present at feature-rich targets like javascript
/wasm
will be started a bit later.
Disposition
Take a look at GeneralCategory.hs. Be careful, it is huge to load in browser. Better wget
it.
Its size in uncompressed form is ~3.2mb
. But you would note that it has numerous repetitions of same symbols. Even trivial compression which substitute repetitions of \28\28\28\28\28\28\28\28\28\28\28\28\28\28\28\28\28\28
into something like \28{18}
would reduce its size much. We could pack it in compressed form which would be uncompressed in the memory at launching.
Proposed behavior
- Rewrite
ucd2haskell
to produce compressed form ofGeneralCategory
where repetitions will be replaced on bytes with counter. Very trivial compression. - Add to generated code the ability to decompress this bytestring at launch.
Possible workarounds to avoid doing anything at code :-)
- We know that at
javascript
it is possible to distribute bundles gzipped. So, gzipped versionGeneralCategory
takes only~8kb
. It is available already without any changes at our code. - Same way is possible for
wasm
target.wasm
distributed from HTTP server it is possible to apply same gzipping rules. -
MacOS
apps can be delivered via.pkg
which applies some compression as well. And it is effective same level as gzipped versions ofjavascript
orwasm
. - Android/iOS deliver apps in compressed forms. So, it is not so important again.
- Linux packages distributed in compressed forms.
- Windows 10 OOTB supports exe compression. For other versions it is possible to use tools like UPX.
Does it seem enough to not do anything with it? Sounds like all this are resolvable issues of shipping at specific platforms. And, by the end, not actually GHC issue. But probably we could create/edit a wiki-page with described best practices how to ship compiled bundles/binaries for various platforms. May be putting efforts there could be more useful than fighting with fat size of GeneralCategory
when the code actually depends on it and its size can be effectively mitigated by distribution tooling.
Environment
- GHC commit hash:
2b1af08b94024c104b54eadd710855e9f8a90fb6
Optional:
- System Architecture: All targets