Low-hanging(?) optimization fruit
While trying to optimize the linear branch, I decided to look at a ticky output of compiling a file. (The precise details are not relevant.) I took these results and sorted by the number of entries. The idea was to get a sense of what functions are actually important to pay attention to. The results are shocking. Here is the head:
| Entries | Alloc | Alloc'd | Non-void | Arguments | Name | 
|---|---|---|---|---|---|
| 882132 | 37,662,656 | 0 | 3 | i.M | containers-0.6.2.1:Data.IntMap.Internal.$winsert{v rAxS} ( | 
| 457987 | 4,142,400 | 0 | 2 | iM | containers-0.6.2.1:Data.IntMap.Internal.$wlookup{v rAxo} ( | 
| 401502 | 18,270,560 | 0 | 2 | >L | base:GHC.Base.map{v 01X} (fun) | 
| 321081 | 0 | 0 | 2 | Li | base:GHC.List.$wlenAcc{v r4jQ} (fun) | 
| 318167 | 4,218,520 | 0 | 2 | iM | containers-0.6.2.1:Data.IntMap.Internal.$wdelete{v rAA7} ( | 
| 289716 | 6,013,168 | 274400 | 1 | M | go{v sbLg} (ghc:GHC.Core.TyCo.Subst) (fun) in r1IP | 
| 268765 | 1,479,648 | 0 | 2 | ML | ghc:GHC.Core.TyCon.expandSynTyCon_maybe{v r4Ls} (fun) | 
| 221796 | 5,240,200 | 0 | 2 | LL | base:GHC.Base.++{v 03} (fun) | 
| 212527 | 3,147,768 | 0 | 2 | LL | base:GHC.List.reverse1{v r4jZ} (fun) | 
| 183321 | 0 | 0 | 2 | L. | ghc:Util.seqList{v r1O6} (fun) | 
| 162516 | 0 | 0 | 1 | L | ghc:GHC.Core.Type.seqTypes{v r3um} (fun) | 
| 159092 | 0 | 0 | 1 | M | ghc:GHC.Core.Type.seqType{v r3ul} (fun) | 
| 157883 | 0 | 0 | 2 | SM | ghc:GHC.Types.Var.Env.lookupVarEnv{v r1As} (fun) | 
| 141645 | 0 | 0 | 2 | MS | ghc:GHC.Types.Var.Set.elemVarSet{v r1n2} (fun) | 
| 138562 | 336,384 | 0 | 3 | MiM | ghc:GHC.Types.Module.$wpoly_go1{v rjEr} (fun) | 
| 138506 | 0 | 0 | 2 | iM | containers-0.6.2.1:Data.IntSet.Internal.$wmember{v rhS3} ( | 
| 120471 | 0 | 0 | 1 | + | ghc-prim:GHC.Classes.=={v 02H} (fun) | 
| 116857 | 0 | 0 | 3 | SMS | ghc:GHC.Core.TyCo.FVs.deepTcvFolder4{v r7fN} (fun) | 
| 113934 | 0 | 0 | 3 | +.L | base:GHC.List.elem{v raa} (fun) | 
| 96460 | 0 | 0 | 3 | SLS | ghc:GHC.Core.TyCo.FVs.tyCoVarsOfTypes1{v r7g7} (fun) | 
| 96378 | 4,124,992 | 0 | 3 | i.M | containers-0.6.2.1:Data.IntMap.Strict.Internal.$winsert{v | 
There are a plethora of curiosities in here.
- We spend a lot of energy inserting and looking in maps. Perhaps that's expected. But these are lazy maps. Is that good? Is it possible that (gasp) a mutable hashtable would be better?
- We spend a lot of energy dealing with lists. Lists! Lists are generally poor for anything other than iteration, where they should be fused away and never heard from. Yet this list contains entries for map,length(!),++(!),reverse, andelem(!!!!!!). What on earth are we usingelem(linear lookup in a list) for? And can we please find some better idea than(++), which copies one of its inputs.
- We spend a lot of calls forcing things: seqList,seqTypes,seqType. Maybe we just need stricter data structures.
- We see (==). Any use of==on a concrete type should be blasted away by now. So this must be==on polymorphic types. Yet it persists. Maybe clever uses ofSPECIALISEpragmas will kill this.
It is possible that tracking these down will find easy ways to make a big difference to compilation times. Note that there is nothing particular about linear types here (though implementing a mutable hashtable using linear types is tempting).
Although performance tuning Haskell code is hard, this might be a nice ticket for an experienced Haskeller (but not necessarily an experienced GHC hacker) to tackle. Happy to offer advice!