GHC allocates more than you would expect if you think about it

I am profiling a GHC API application which starts a GHC session and compiles a library which contains 10 modules.

One profile, one hundred questions.

Nearly 60mb of ARR_WORDS are created, what are they for? That's a lot of bytestrings. These are probably from FastStrings but how are there so many? Is the sharing broken?
The allocations of this program are dominated by strings. On the initial load, approximately 12mb of list cons are allocated. Each cons constructor is 3 words so it equates to about 500 000 cons nodes! It would appear that almost all of these allocations come from strings. It is quite surprising to allocate this much and the strings don't appear to be GCed on subsequent loads. I didn't have much luck tracking down precisely why they are retained.
Why are there over 8mb of IO actions?
Why is there so much duplication between Module and UnitId. There should only be a fixed amount of these, perhaps 1000 at maximum.

Note that some of these allocations might come from h-i-e but I find it hard to believe that ALL of the problems are in the library. There have to be some improvements made in GHC itself.

Does anyone have any idea where all these allocations come from? The numbers are significant.

Here is a sample profile. The initial load (before the first marker) loads all 10 modules, the subsequent markers are on module reloads.

https://mpickering.github.io/hie.eventlog

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information