Improve perf in substitution
When looking at perf/compiler/T12545
I see this kind of thing as the top entries in ticky allocation
Entries Alloc Alloc'd Non-void Arguments STG Name
--------------------------------------------------------------------------------
3137531 386383024 0 2 LL GHC.Utils.Misc.$wsplitAtList{v rg1m} (fun)
7490472 364368232 0 3 i.M Data.IntMap.Internal.$winsert{v ruzA} (fun)
4785911 353204632 0 2 >L GHC.Base.map{v 01X} (fun)
4418053 251655200 0 2 LL GHC.List.zip{v 0x} (fun)
4976632 89297760 0 2 >L GHC.Utils.Misc.strictMap{v r4ga} (fun)
1085849 60019872 0 2 SL GHC.Tc.Utils.Zonk.zonkTcTypesToTypesX{v rn} (fun)
I think the splitAtList
stuff comes from splitting argument lists in Coercion
. And zip
is, I think, from zipTyEnv
, from substTyWith
, also invoked from Coercion.
There is low hanging fruit here:
- splitListAt generates extra allocation. In the common case where there are no "leftover" arguments, it's all wasted work.
- We zip together tyvars and types, but immediately take it apart in
mkVarEnv
.
See also #18541 which identifies other low-hanging fruit.
Some attention here might pay dividends.