Low-hanging(?) optimization fruit

Agreed.

Can we combine this ticket with #18535, which is very similar territory.

It would be interesting to know total allocations for this report to put the numbers in perspective.

Is IntMap the best map for our usecase.

Based on this benchmark we should at least investigate switching to a different map type as an option where maps requiring IO are reasonable.

The downside is of course that it's possible that IO/ST will creep into a lot of code, negating parts of the performance benefit as well as making the code harder to work with.

Pure HashMaps on the other hand don't seem to be worth it.

(==)

There is #17759 which means almost any expression of the sort (xs == ys) where xs has a concrete type [Foo]` still can't be specialized.

I suspect this is the reason why == shows up so often.

Lists

List fusion can be unreliable. It can fail in practice for various reasons. The most recent example being elem: #18034 (closed). GHC also often uses lists explicitly as data structures where the compiler can't eliminate them via list fusion. So "map" showing up is not at all surprising.

What on earth are we using elem (linear lookup in a list) for?

Things like tc `elem` [ eqPrimTyCon, eqReprPrimTyCon, heqTyCon ] are perfectly sane. The register allocator also uses elem and even nub(!) in places where it's guaranteed that the given lists will be very small. Sometimes it's just more overhead to build a set than to just use elem on the list directly.

For (++) usually we should use OrdList to avoid the problem. It's used heavily in the backend exactly to avoid the use of (++) already.

Based on the allocation counts it seems likely there are some bad uses here. But we might just have a lot of sane uses which in their sum amount to a lot of work. That is to say if we replaced all of them we might not see any benefit in terms of runtime.

Using more array based structures to pass around things like bindings would still be good. But it's a lot of work and there are things we can do for lists which are currently not possible for arrays.

I tried going in that direction in the past, but incremental changes made things worse as they introduced a lot of conversions from/to lists. In the end I wasn't willing to spend the time to refactor enough code to say for sure if it would pay off.

Strictness

Parts of the compiler could benefit from using stricter types, including maps, in others we would loop/crash. So this is not easily explored. I tried making some maps and types strict in the past and often it either made no difference or caused correctness issues.

We probably can do a bit of a better job there. But it heavily depends on the part of GHC you are looking at and none of the wins will be all that low hanging I fear.

Regarding Strictness.

From #20222 (closed) I narrowed down the loops caused by strictness in FM.hs to just 3 functions. Then I swapped all IntMaps and Maps to be strict in this branch.

The CI showed that allocations increased by up to 10% in some cases. I didn't dig any further with perf though.

It's often the case that when you make some things scrict, the unnecessary long-lived thunks just get pushed elsewhere, but don't disappear. The new thunks and their associated memory leaks, if any, can easily be bigger, because more things are created due to evaluation, but not necessarily anything get released.

What I'm saying the 10% may not come from pessimisation, but from not enough sctrictness still.

Good points. Perhaps I'll do some leak plugging and spin the branch into a draft.

I've made a patch to avoid allocations in IntMap.lookup.

Patch

--- containers/src/Data/IntMap/Internal.hs
+++ containers/src/Data/IntMap/Internal.hs
@@ -584,31 +584,28 @@ notMember :: Key -> IntMap a -> Bool
 notMember k m = not $ member k m
 
 -- | /O(min(n,W))/. Lookup the value at a key in the map. See also 'Data.Map.lookup'.
-
--- See Note: Local 'go' functions and capturing]
 lookup :: Key -> IntMap a -> Maybe a
-lookup !k = go
-  where
-    go (Bin p m l r) | nomatch k p m = Nothing
-                     | zero k m  = go l
-                     | otherwise = go r
-    go (Tip kx x) | k == kx   = Just x
-                  | otherwise = Nothing
-    go Nil = Nothing
+lookup = go
+   where
+      go !k (Bin p m l r) | nomatch k p m = Nothing
+                          | zero k m  = go k l
+                          | otherwise = go k r
+      go !k (Tip kx x)    | k == kx   = Just x
+                          | otherwise = Nothing
+      go _  Nil           = Nothing
 
 
--- See Note: Local 'go' functions and capturing]
 find :: Key -> IntMap a -> a
-find !k = go
+find = go
   where
-    go (Bin p m l r) | nomatch k p m = not_found
-                     | zero k m  = go l
-                     | otherwise = go r
-    go (Tip kx x) | k == kx   = x
-                  | otherwise = not_found
-    go Nil = not_found
-
-    not_found = error ("IntMap.!: key " ++ show k ++ " is not an element of the map")
+    go !k (Bin p m l r) | nomatch k p m = not_found k
+                        | zero k m  = go k l
+                        | otherwise = go k r
+    go !k (Tip kx x)    | k == kx   = x
+                        | otherwise = not_found k
+    go !k Nil           = not_found k
+
+    not_found !k = error ("IntMap.!: key " ++ show k ++ " is not an element of the map")

It doesn't seem to have much impact on containers' intmap-benchmark. Could someone try to reproduce ticky results with GHC?

Things like tc `elem` [ eqPrimTyCon, eqReprPrimTyCon, heqTyCon ] are perfectly sane.

Because it's not obvious I want to point out that this code used to generate a call to elem, but does not any more and that this has been a fairly recent change. As I remember from before @rae did this analysis.

added Ttask label and removed needs triage label

mentioned in issue #18535

In order to have a reproducible comparison I generated a ticky report by:

Using the quick flavour with these additional options:

stage1.*.ghc.hs.opts += -O -ddump-simpl -ddump-stg-final -ddump-to-file -ticky -ticky-LNE
stage1.*.ghc.link.opts += -eventlog -debug -ticky -ticky-LNE

Based on commit acf537f9 Make splitAtList strict in its arguments

By invoking: 'E:\ghc_head\_ticky\stage1\bin\ghc.exe' 'nofib/spectral/simple/Main.hs' '-O' '-fforce-recomp' +RTS '-s' '-r'

RTS stats:

$ _ticky/stage1/bin/ghc.exe nofib/spectral/simple/Main.hs -O -fforce-recomp +RTS -s -r
[1 of 1] Compiling Main             ( nofib\spectral\simple\Main.hs, nofib\spectral\simple\Main.o )
Linking nofib\spectral\simple\Main.exe ...
   4,384,197,400 bytes allocated in the heap
     925,177,096 bytes copied during GC
      75,695,032 bytes maximum residency (14 sample(s))
         515,144 bytes maximum slop
             205 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       431 colls,     0 par    2.531s   2.584s     0.0060s    0.0565s
  Gen  1        14 colls,     0 par    1.484s   1.535s     0.1096s    0.2819s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    3.203s  (  5.742s elapsed)
  GC      time    4.016s  (  4.119s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    7.219s  (  9.862s elapsed)

  Alloc rate    1,368,725,041 bytes per MUT second

  Productivity  44.4% of total user, 58.2% of total elapsed

Functions sorted by allocation

    Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
--------------------------------------------------------------------------------
    8538416  361518456          0    3 i.M                   Data.IntMap.Internal.$winsert{v r81} (fun)
     801997  141535168          0    6 SMMSSM                GHC.Core.Opt.Simplify.simplExpr3{v rzbQ} (fun)
    2237630  118597864          0    2 >L                    GHC.Base.map{v 01X} (fun)
     352431   91155784          0    3 SSS                   $w$j{v sirF} (GHC.Core.Opt.OccurAnal) (LNE) in siqH
     229360   59752896          0    2 SM                    $wtagLamBinder{v rhOR} (GHC.Core.Opt.OccurAnal) (fun)
      78456   54343168          0    7 SMMMSSM               addAltUnfoldings{v rzcE} (GHC.Core.Opt.Simplify) (fun)
    1290911   54130544          0    3 i.M                   Data.IntMap.Strict.Internal.$winsert{v r2c} (fun)
    3108527   52489280          0    2 iM                    Data.IntMap.Internal.$wdelete{v r7W} (fun)
     286253   50380528          0    2 >L                    Data.OldList.sortBy{v rE} (fun)
     219028   49153824          0    7 SSSSSM>               GHC.Core.Opt.Simplify.Env.$wsubst_id_bndr{v r4V} (fun)
     334067   49007240          0    5 SMLiM                 GHC.Core.Opt.Simplify.Utils.$wmkArgInfo{v r4i} (fun)
     967259   40602536          0    3 >iM                   Data.IntMap.Strict.Internal.$walter{v r28} (fun)
     434131   38144304          0    4 MEiM                  $waboveNest{v r6yl} (GHC.Utils.Ppr) (fun)
    1930435   36027992          0    2 LL                    GHC.Base.++{v 03} (fun)
     291745   35901240          0    2 >L                    GHC.Utils.Misc.$wmapAndUnzip{v r3l} (fun)
     222100   35484616          0    8 MSMSSSwM              GHC.Types.Id.Info.$wzapFragileInfo{v r4Q} (fun)
     503878   35035984          0    2 SM                    GHC.Core.Opt.OccurAnal.$woccAnal{v r1o} (fun)
     368523   32804448          0    6 SMMSSM                rebuild{v rzf6} (GHC.Core.Opt.Simplify) (fun)
     228957   32484816          0    2 MM                    setBinderOcc{v r3G9} (GHC.Core.Opt.OccurAnal) (fun)
     125940   32240640          0    6 LLLLLL                GHC.Core.DataCon.$wdataConInstSig{v r9g} (fun)
     622300   29354640          0    1 M                     GHC.Core.Type.isAlgType_$srepSplitTyConApp_maybe{v r83} (fun)
     234074   29048880          0    8 MLMELLMM              GHC.Core.Opt.Simplify.Utils.$waddValArgTo{v r43} (fun)
     438866   28088096          0    0                       fail{v stpK} (GHC.Core.Type) (LNE) in r21
     151568   27305568          0    2 LL                    go10{v sC2A} (GHC.Core.Opt.Simplify) (fun) in sBYZ
     549646   26983320          0    0                       fail2{v sBDF} (GHC.Core.Opt.Simplify) (LNE) in sBDC
     584226   26782008          0    3 MEM                   beside{v rSZ} (GHC.Utils.Ppr) (fun)
     352431   26762432          0    0                       fail{v siqH} (GHC.Core.Opt.OccurAnal) (LNE) in siqF
     246032   26731248          0    2 SL                    $j{v sjaP} (GHC.Core.Opt.OccurAnal) (LNE) in rhQP
      56757   25427136          0    0                       fail{v sBYZ} (GHC.Core.Opt.Simplify) (LNE) in rzf9
     208473   23272848          0    2 LS                    $wgo4{v rhQC} (GHC.Core.Opt.OccurAnal) (fun)
     522473   22886304          0    4 >i.M                  Data.IntMap.Strict.Internal.$winsertWithKey{v r2d} (fun)
    1587183   22571600          0    2 i.                    exit{v svma} (Data.IntMap.Internal) (LNE) in r84
     363326   22234640          0    3 SSM                   GHC.Core.Opt.Simplify.Env.$wsubstId{v r4T} (fun)
      91929   21804912          0    0                       fail3{v sBDX} (GHC.Core.Opt.Simplify) (LNE) in sBDF
      79545   21636240          0    2 SM                    $w$j{v sA9P} (GHC.Core.Opt.Simplify) (LNE) in rzeP
      79444   20973216          0    2 ML                    $w$j{v sqYU} (GHC.Core.Opt.Simplify.Utils) (LNE) in r5A
     108159   20766528          0    1 T                     sat_siOe{v} (GHC.Core.Opt.OccurAnal) (fun) in r1o
     234448   20631424          0    2 MM                    $w$j{v sqBf} (GHC.Core.Opt.Simplify.Utils) (LNE) in sqBe
     447720   20617136          0    3 MIM                   merge1{v svTs} (Data.IntMap.Internal) (fun) in r27
      49358   20543904          0    4 MEEM                  $w$j{v slWi} (GHC.Core.Unfold) (LNE) in r4
      65418   20410416          0    4 SMSM                  $w$j{v sAfy} (GHC.Core.Opt.Simplify) (LNE) in rzeR
    1545041   20243016          0    2 LL                    GHC.List.reverse1{v r1T} (fun)
     208940   20058080          0    6 SMMSSM                GHC.Core.Opt.Simplify.simplExpr2{v rzbP} (fun)
     660234   19682408          0    3 SLL                   $woccAnalArgs{v rhQP} (GHC.Core.Opt.OccurAnal) (fun)
     479255   19170200          0    3 LLS                   go2{v scuY} (GHC.Core.Unify) (fun) in r3M
    1755238   17423184          0    5 M>SLS                 GHC.Core.FVs.$wexpr_fvs{v r1s} (fun)
      79444   17301264          0    3 LLL                   GHC.Core.Utils.$wfilterAlts{v r1J} (fun)
     617196   16836136          0    1 L                     go3{v sqaP} (GHC.Core.Opt.Simplify.Utils) (fun) in r8
      41491   16463352          0    1 S                     $w$j{v saWL} (GHC.CmmToAsm.Reg.Linear) (LNE) in saWC
     420458   16404384          0    2 LM                    GHC.Core.Opt.Simplify.Utils.$wgo{v r4b} (fun)
     430004   16057680          0    2 LL                    GHC.List.zip{v 0x} (fun)
     281615   15770440          0    3 >MM                   Data.IntMap.Internal.unionWithKey{v r27} (fun)
     349179   15553560          0    4 SSSS                  $j1{v siti} (GHC.Core.Opt.OccurAnal) (LNE) in sirF
     473258   15120280          0    3 iwM                   Data.IntSet.Internal.$winsertBM{v r3P} (fun)
     159795   14061960          0    2 SS                    GHC.Types.Demand.bothDmd{v rc} (fun)
     311016   13928056          0    4 >i.M                  Data.IntMap.Internal.$winsertWithKey{v r83} (fun)
     214941   13756224          0    0                       $j{v sjWv} (GHC.Core.Opt.Simplify.Env) (LNE) in r4V
     107997   13166304          0    3 SSM                   $wdmdAnal'{v rjRC} (GHC.Core.Opt.DmdAnal) (fun)
      52745   13080760          0    1 M                     keep{v sdu9} (GHC.Cmm.Sink) (LNE) in sdtX
     319539   12855432          0    3 SMM                   GHC.Core.Rules.$wgetRules{v rN} (fun)
     175333   12415568          0    1 L                     go1{v sfvg} (GHC.Utils.Outputable) (fun) in raq
     128731   12341264          0    1 M                     GHC.Core.FVs.$wfreeVars{v r1w} (fun)
      15760   12272872          0    0                       $j{v sizV} (GHC.Core.Opt.OccurAnal) (LNE) in r1r
      43350   12218160          0    0                       fail{v saVg} (GHC.CmmToAsm.Reg.Linear) (LNE) in raHu
      52318   11936720          0    4 +SSL                  livenessForward{v r1WJ} (GHC.CmmToAsm.Reg.Liveness) (fun)
     552823   11872184          0    1 M                     size_up{v smE2} (GHC.Core.Unfold) (fun) in r3j
      61076   11630856          0    2 LS                    $wgo1{v rijL} (GHC.Core.Rules) (fun)
      15477   11481232          0    1 S                     lexToken{v rCOw} (GHC.Parser.Lexer) (fun)
     157327   11327544          0    1 S                     GHC.Types.Demand.strictenDmd{v r1f} (fun)
     224447   11295768          0    1 M                     go1{v saGA} (GHC.Types.Unique.FM) (fun) in rP
     360112   11218904          0    2 MM                    addSizeNSD{v smqG} (GHC.Core.Unfold) (fun) in r3j
     151568   10763272          0    5 SLSSM                 go9{v rzfa} (GHC.Core.Opt.Simplify) (fun)
      66761   10681760          0    2 wW                    $j1{v sE2v} (GHC.Parser.Lexer) (LNE) in rCx8
      57895   10652680          0    5 >S.S.                 GHC.Types.Unique.FM.plusUFM_CD{v rP} (fun)
     331966   10622912          0    2 ML                    GHC.Core.Opt.Simplify.Utils.argInfoExpr{v r8} (fun)
     308528   10595520          0    3 MLI                   size_up_app{v smE3} (GHC.Core.Unfold) (fun) in r3j
      94436   10566464          0    0                       $j{v smLF} (GHC.Core.Unfold) (LNE) in smE2
     304342   10514000          0    3 MIM                   poly_merge1{v ruEw} (Data.IntMap.Internal) (fun)
     330800   10192160          0    3 MIM                   poly_merge0{v ruEv} (Data.IntMap.Internal) (fun)
      31374   10044928          0    3 ESS                   $j{v smti} (GHC.Types.Id.Make) (LNE) in ri
      66011    9905264          0    8 SLMLMSSM              tryRules{v rzd0} (GHC.Core.Opt.Simplify) (fun)
     942360    9830296          0    2 MM                    Data.IntMap.Internal.$fMonoidIntMap1{v r57} (fun)
      31832    9576576          0    4 SESM                  $wfindBndrDmd{v rjQI} (GHC.Core.Opt.DmdAnal) (fun)
     108045    9507960          0    0                       $j{v si8d} (GHC.Core.Opt.OccurAnal) (LNE) in r3Fx
      43578    9485488          0    6 SSSSSM                GHC.Core.Subst.$wsubstIdBndr{v r2C} (fun)
     454357    9420688          0    2 MM                    go1{v svUS} (Data.IntMap.Internal) (fun) in r27
     234112    9411800          0    1 M                     GHC.Cmm.Node.$fNonLocalCmmNode_$csuccessors{v r1O} (fun)
      83985    9371320          0    7 EL>>SSS               $j{v siMQ} (GHC.Core.Opt.OccurAnal) (LNE) in r1o
     185460    9013888          0    4 MLMS                  go{v sngp} (GHC.Core.SimpleOpt) (LNE) in rY
     429296    8879248          0    2 >L                    GHC.List.filter{v 0w} (fun)
      31252    8726512          0    2 MM                    $j{v sk8Z} (GHC.Core.Opt.DmdAnal) (LNE) in rjRC
     265103    8495856          0    1 i                     $w$j{v smyh} (GHC.Core.Unfold) (LNE) in smxR
     182674    8320664          0    4 LSSM                  go10{v sAaM} (GHC.Core.Opt.Simplify) (fun) in sAau
     340491    8171784          0    1 S                     GHC.IO.Unsafe.unsafeDupableInterleaveIO1{v rf} (fun)
       9462    8141696          0    2 SL                    sink{v sdFf} (GHC.Cmm.Sink) (fun) in rF
      83910    8055360          0    3 S>M                   GHC.Core.SimpleOpt.$wexprIsConApp_maybe{v rY} (fun)
       3827    7995328          0    5 SMSMS                 GHC.StgToCmm.Layout.$wadjustHpBackwards{v r13} (fun)
      50243    7925968          0    2 LL                    $w$j{v se8h} (GHC.CmmToAsm.Reg.Liveness) (LNE) in se8g
     606449    7894416          0    0                       fail1{v sBDC} (GHC.Core.Opt.Simplify) (LNE) in sBDA
     433906    7636344          0    2 ML                    GHC.Core.$wpoly_go1{v reQ} (fun)
     103374    7560496          0    9 SMLMMTSSM             simplAlt{v rzf9} (GHC.Core.Opt.Simplify) (fun)
      46513    7442080          0    2 MM                    $w$j{v siNj} (GHC.Core.Opt.OccurAnal) (LNE) in siMQ
     132698    7431088          0    6 M>>SLS                GHC.Core.FVs.$waddBndr{v rW} (fun)
     433410    7421952          0    3 MLL                   go9{v siKa} (GHC.Core.Opt.OccurAnal) (LNE) in r1o
      83934    7386192          0    0                       $j1{v siNE} (GHC.Core.Opt.OccurAnal) (LNE) in r1o
      76735    7366560          0    2 MM                    GHC.Types.Id.scaleVarBy{v r1n} (fun)
     182957    7318280          0    6 LM>SLS                $wlvl{v ralI} (GHC.Core.FVs) (fun)
     653288    7308600          0    2 LM                    go4{v rhQx} (GHC.Core.Opt.OccurAnal) (fun)
      30159    7238160          0    8 M>SEETML              $wnumFoldingRules{v rEkS} (GHC.Core.Opt.ConstantFold) (fun)
      39281    7227704          0    0                       $w$j1{v sea1} (GHC.CmmToAsm.Reg.Liveness) (LNE) in se8h
    2143576    7159360          0    2 ML                    GHC.Core.TyCon.expandSynTyCon_maybe{v r4} (fun)
      79545    6999960          0    5 SMMSM                 $w$j1{v sAau} (GHC.Core.Opt.Simplify) (LNE) in sA9P
     230930    6916824          0    3 MSL                   go{v s6v5} (GHC.Cmm.Dataflow.Graph) (LNE) in r1b
     113781    6873000          0    3 .MM                   Data.Set.Internal.balanceL{v r4w} (fun)
     125496    6858288          0    4 ..MM                  Data.Map.Internal.balanceR{v rd} (fun)

functions by entry count

    Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
--------------------------------------------------------------------------------
   11677836          0          0    1 M                     go1{v svme} (Data.IntMap.Internal) (LNE) in r84
    8538416  361518456          0    3 i.M                   Data.IntMap.Internal.$winsert{v r81} (fun)
    6409478          0          0    1 M                     go1{v s4Ck} (GHC.Types.Var.Set) (LNE) in rh
    3373038          0          0    5 L>SLS                 GHC.Core.TyCo.FVs.$wtyCoFVsOfTypes{v r1h} (fun)
    3108527   52489280          0    2 iM                    Data.IntMap.Internal.$wdelete{v r7W} (fun)
    2591420          0          0    1 M                     go2{v sf6q} (Data.IntSet.Internal) (LNE) in r3X
    2452541          0          0    2 iM                    Data.IntMap.Internal.$wlookup{v r84} (fun)
    2237630  118597864          0    2 >L                    GHC.Base.map{v 01X} (fun)
    2180199    1125984          0    5 M>SLS                 GHC.Core.TyCo.FVs.$wtyCoFVsOfType{v r1f} (fun)
    2143576    7159360          0    2 ML                    GHC.Core.TyCon.expandSynTyCon_maybe{v r4} (fun)
    2126061          0          0    2 MS                    GHC.Types.Var.Set.elemVarSet{v rh} (fun)
    2007092          0          0    2 Li                    GHC.List.$wlenAcc{v r17} (fun)
    1930435   36027992          0    2 LL                    GHC.Base.++{v 03} (fun)
    1755238   17423184          0    5 M>SLS                 GHC.Core.FVs.$wexpr_fvs{v r1s} (fun)
    1587183   22571600          0    2 i.                    exit{v svma} (Data.IntMap.Internal) (LNE) in r84
    1545041   20243016          0    2 LL                    GHC.List.reverse1{v r1T} (fun)
    1419533          0          0    1 L                     GHC.Core.Type.seqTypes{v r3C} (fun)
    1314760          0          0    2 iM                    Data.IntSet.Internal.$wmember{v r3X} (fun)
    1290911   54130544          0    3 i.M                   Data.IntMap.Strict.Internal.$winsert{v r2c} (fun)
     999518          0          0    1 M                     GHC.Core.Type.seqType{v r3B} (fun)
     967259   40602536          0    3 >iM                   Data.IntMap.Strict.Internal.$walter{v r28} (fun)
     942360    9830296          0    2 MM                    Data.IntMap.Internal.$fMonoidIntMap1{v r57} (fun)
     858541          0          0    2 SM                    GHC.Types.Var.Set.extendVarSet1{v r2c} (fun)
     821037          0          0    2 iL                    GHC.Utils.Misc.lengthExceeds1{v r3e} (fun)
     802508          0          0    1 i                     exit{v s4Ch} (GHC.Types.Var.Set) (LNE) in rh
     801997  141535168          0    6 SMMSSM                GHC.Core.Opt.Simplify.simplExpr3{v rzbQ} (fun)
     728188          0          0    1 M                     go9{v si66} (GHC.Core.Opt.OccurAnal) (LNE) in si4p
     721722          0          0    1 M                     GHC.Core.Stats.$wexprSize{v rO} (fun)
     712149          0          0    2 iw                    exit{v sf6i} (Data.IntSet.Internal) (LNE) in r3X
     697989          0          0    1 M                     go9{v sisV} (GHC.Core.Opt.OccurAnal) (LNE) in sirF
     687484    2491128          0    6 SSMSSM                rebuildCall{v rzf7} (GHC.Core.Opt.Simplify) (fun)
     686020          0          0    2 ii                    loop1{v s8vA} (GHC.Data.FastString) (LNE) in r3n
     671095          0          0    0                       fail{v sBDA} (GHC.Core.Opt.Simplify) (LNE) in rzf7
     660234   19682408          0    3 SLL                   $woccAnalArgs{v rhQP} (GHC.Core.Opt.OccurAnal) (fun)
     653288    7308600          0    2 LM                    go4{v rhQx} (GHC.Core.Opt.OccurAnal) (fun)
     643545       8288          0    3 SMM                   GHC.Core.Type.eqType_go{v r8x} (fun)
     622300   29354640          0    1 M                     GHC.Core.Type.isAlgType_$srepSplitTyConApp_maybe{v r83} (fun)
     620045       6720          0    0                       $j{v stim} (GHC.Core.Type) (LNE) in r8x
     619494          0          0    0                       $j1{v stin} (GHC.Core.Type) (LNE) in stim
     617196   16836136          0    1 L                     go3{v sqaP} (GHC.Core.Opt.Simplify.Utils) (fun) in r8
     616930          0          0    2 MM                    GHC.Types.Var.$fEqVar_$c=={v r5G} (fun)
     616888          0          0    1 M                     GHC.Types.Id.Info.isCoVarDetails{v rd} (fun)
     616558          0          0    2 ML                    GHC.Core.Opt.Simplify.Utils.mkArgInfo_add_type_strictness{v r4t} (fun)
     606449    7894416          0    0                       fail1{v sBDC} (GHC.Core.Opt.Simplify) (LNE) in sBDA
     599744          0          0    0                       fail1{v stiq} (GHC.Core.Type) (LNE) in stio
     599744          0          0    0                       fail{v stio} (GHC.Core.Type) (LNE) in stin
     589806          0          0    2 iL                    GHC.Core.$wpoly_go3{v reS} (fun)
     584226   26782008          0    3 MEM                   beside{v rSZ} (GHC.Utils.Ppr) (fun)
     582989          0          0    2 ii                    GHC.Classes.compareInt#{v r3} (fun)
     578191      82656          0    2 ML                    GHC.Core.Type.piResultTys_$spiResultTys{v rbd} (fun)
     569241          0          0    1 i                     $j{v spR5} (GHC.Core.Type) (LNE) in r3j
     569241          0          0    2 MM                    GHC.Core.Type.nonDetCmpTc{v r3j} (fun)
     567186     212624          0    4 ppMM                  GHC.Utils.Ppr.$wlayLeft{v r38} (fun)
     561262          0          0    1 M                     go1{v s4Bp} (GHC.Types.Var.Set) (LNE) in rh
     552823   11872184          0    1 M                     size_up{v smE2} (GHC.Core.Unfold) (fun) in r3j
     549646   26983320          0    0                       fail2{v sBDF} (GHC.Core.Opt.Simplify) (LNE) in sBDC
     540942          0          0    0                       $j{v sqBe} (GHC.Core.Opt.Simplify.Utils) (LNE) in r4t
     538755    1853360          0    3 >.M                   GHC.Data.OrdList.foldrOL{v r5} (fun)
     527574          0          0    1 S                     f{v si7v} (GHC.Core.Opt.OccurAnal) (fun) in rhOR
     522473   22886304          0    4 >i.M                  Data.IntMap.Strict.Internal.$winsertWithKey{v r2d} (fun)
     503878   35035984          0    2 SM                    GHC.Core.Opt.OccurAnal.$woccAnal{v r1o} (fun)
     501091          0          0    3 +.L                   GHC.List.elem{v ra} (fun)
     480228    6136920          0    6 >L>SLS                GHC.Utils.FV.$wmapUnionFV{v rm} (fun)
     479255   19170200          0    3 LLS                   go2{v scuY} (GHC.Core.Unify) (fun) in r3M
     477663          0          0    2 LS                    GHC.Types.Var.Env.delBndrsL2{v r3C} (fun)
     473258   15120280          0    3 iwM                   Data.IntSet.Internal.$winsertBM{v r3P} (fun)
     468391          0          0    3 +MM                   GHC.Data.OrdList.strictlyOrdOL{v rg} (fun)
     454357    9420688          0    2 MM                    go1{v svUS} (Data.IntMap.Internal) (fun) in r27
     450259          0          0    1 +                     GHC.Classes.compare{v rt} (fun)
     448953          0          0    1 M                     GHC.Types.Var.isLocalVar{v re} (fun)
     447720   20617136          0    3 MIM                   merge1{v svTs} (Data.IntMap.Internal) (fun) in r27
     445659          0          0    2 II                    GHC.Classes.compareInt{v r2} (fun)
     443814          0          0    2 MM                    GHC.Core.Type.eqType{v r21} (fun)
     439636    3509376          0    1 M                     go2{v s8ZB} (GHC.Core.TyCo.Subst) (fun) in r1W
     438866   28088096          0    0                       fail{v stpK} (GHC.Core.Type) (LNE) in r21
     435511      54144          0    1 M                     GHC.Core.Type.appTyArgFlags_$stypeKind{v r6W} (fun)
     434131   38144304          0    4 MEiM                  $waboveNest{v r6yl} (GHC.Utils.Ppr) (fun)
     433906    7636344          0    2 ML                    GHC.Core.$wpoly_go1{v reQ} (fun)
     433410    7421952          0    3 MLL                   go9{v siKa} (GHC.Core.Opt.OccurAnal) (LNE) in r1o
     433025          0          0    1 +                     GHC.Classes.=={v 02H} (fun)
     431091    6830112          0    1 M                     GHC.Core.Type.classifiesTypeWithValues_$skindRep_maybe{v r6V} (fun)
     430004   16057680          0    2 LL                    GHC.List.zip{v 0x} (fun)
     429296    8879248          0    2 >L                    GHC.List.filter{v 0w} (fun)
     420458   16404384          0    2 LM                    GHC.Core.Opt.Simplify.Utils.$wgo{v r4b} (fun)
     417316          0          0    1 L                     GHC.List.reverse{v rw} (fun)
     396947          0          0    2 pi                    GHC.Utils.Encoding.utf8DecodeCharByteArray#{v ru} (fun)
     392233          0          0    2 ii                    go1{v s8s2} (GHC.Data.FastString) (LNE) in r3p
     392057          0          0    2 LS                    GHC.Types.Var.Env.extendVarEnvList1{v r3T} (fun)
     386154    1789344          0    6 ++S>.M                GHC.Cmm.Expr.$fUserOfRegsrCmmExpr_$cfoldRegsUsed{v r7h} (fun)
     379648          0          0    1 M                     go9{v si0p} (GHC.Core.Opt.OccurAnal) (LNE) in si0c
     375960      21960          0    3 MLI                   size_up_call{v smxR} (GHC.Core.Unfold) (fun) in r3j
     371837       1248          0    2 EM                    nilBeside{v rT0} (GHC.Utils.Ppr) (fun)
     370851        224          0    1 M                     GHC.Core.Type.$wtcSplitTyConApp_maybe{v r85} (fun)
     368523   32804448          0    6 SMMSSM                rebuild{v rzf6} (GHC.Core.Opt.Simplify) (fun)
     363326   22234640          0    3 SSM                   GHC.Core.Opt.Simplify.Env.$wsubstId{v r4T} (fun)
     360112   11218904          0    2 MM                    addSizeNSD{v smqG} (GHC.Core.Unfold) (fun) in r3j
     359116          0          0    1 M                     isRealWorldId{v rlLU} (GHC.Core.Unfold) (fun)
     354131          0          0    3 +LL                   GHC.Classes.$fOrd[]_$ccompare{v r4R} (fun)
     354050     185608          0    4 SMLL                  $woccAnalApp{v rhQM} (GHC.Core.Opt.OccurAnal) (fun)
     353197     189968          0    1 i                     $j{v siqF} (GHC.Core.Opt.OccurAnal) (LNE) in rhQM
     352431   91155784          0    3 SSS                   $w$j{v sirF} (GHC.Core.Opt.OccurAnal) (LNE) in siqH
     352431   26762432          0    0                       fail{v siqH} (GHC.Core.Opt.OccurAnal) (LNE) in siqF
     350525          0          0    3 SLL                   gos{v rpKE} (GHC.Core.Type) (fun)
     349179   15553560          0    4 SSSS                  $j1{v siti} (GHC.Core.Opt.OccurAnal) (LNE) in sirF
     347144          0          0    1 L                     go9{v sitX} (GHC.Core.Opt.OccurAnal) (fun) in sirF
     345619    5460992          0    2 LM                    Data.IntMap.Internal.elems1{v r8D} (fun)
     343920      47104          0    3 MMM                   trimJoinCont{v rbnf} (GHC.Core.Opt.Simplify) (fun)
     342796    4322232          0    2 IM                    go6{v srEG} (GHC.Core.Utils) (fun) in rl
     340491    8171784          0    1 S                     GHC.IO.Unsafe.unsafeDupableInterleaveIO1{v rf} (fun)
     339740          0          0    2 SM                    GHC.Types.Var.Env.delBndrL1{v r3A} (fun)
     338842          0          0    2 MM                    go6{v saRp} (GHC.CmmToAsm.Reg.Linear) (fun) in saNB
     338085    2943040          0    1 L                     sequences{v s6pd} (Data.OldList) (fun) in rE
     336626          0          0    1 M                     GHC.Core.Opt.Simplify.Utils.$wcontArgs{v r49} (fun)
     336626          0          0    6 SMEELE                GHC.Core.Unfold.callSiteInline{v r4} (fun)
     334067   49007240          0    5 SMLiM                 GHC.Core.Opt.Simplify.Utils.$wmkArgInfo{v r4i} (fun)
     332818          0          0    1 L                     mergeAll{v s6pS} (Data.OldList) (LNE) in rE
     332302          0          0    1 M                     GHC.Core.Type.isLiftedRuntimeRep{v r2u} (fun)
     332150          0          0    1 M                     GHC.Core.Type.isLiftedType_maybe_$sisLiftedType_maybe{v r97} (fun)
     331966   10622912          0    2 ML                    GHC.Core.Opt.Simplify.Utils.argInfoExpr{v r8} (fun)
     330800   10192160          0    3 MIM                   poly_merge0{v ruEv} (Data.IntMap.Internal) (fun)
     319539   12855432          0    3 SMM                   GHC.Core.Rules.$wgetRules{v rN} (fun)
     315025    3169632          0    5 L>SLS                 $wgo2{v saIj} (GHC.Core.FVs) (fun) in ralI
     311016   13928056          0    4 >i.M                  Data.IntMap.Internal.$winsertWithKey{v r83} (fun)
     308528   10595520          0    3 MLI                   size_up_app{v smE3} (GHC.Core.Unfold) (fun) in r3j
     304342   10514000          0    3 MIM                   poly_merge1{v ruEw} (Data.IntMap.Internal) (fun)
     303207          0          0    1 M                     GHC.Utils.Ppr.reduceDoc{v r3t} (fun)
     294185     295872          0    1 L                     GHC.Types.Demand.argsOneShots_go{v rad} (fun)
     291745   35901240          0    2 >L                    GHC.Utils.Misc.$wmapAndUnzip{v r3l} (fun)
     286253   50380528          0    2 >L                    Data.OldList.sortBy{v rE} (fun)
     284704    4933304          0    2 >M                    GHC.Cmm.Node.wrapRecExp{v rc} (fun)
     282200          0          0    2 II                    GHC.Classes.eqInt{v r9} (fun)
     281615   15770440          0    3 >MM                   Data.IntMap.Internal.unionWithKey{v r27} (fun)
     277991     644352          0    2 SM                    GHC.Types.Var.Env.uniqAway{v r1v} (fun)
     273458          0          0    1 +                     GHC.Cmm.Expr.foldRegsUsed{v r39} (fun)
     273065          0          0    1 M                     GHC.Types.Var.varType{v r1H} (fun)
     265103    8495856          0    1 i                     $w$j{v smyh} (GHC.Core.Unfold) (LNE) in smxR
     255028    4080448          0    4 SLLS                  GHC.Core.Unify.ruleMatchTyKiX3{v r3M} (fun)
     251696          0          0    1 M                     GHC.Core.Type.isAlgType_$ssplitTyConApp_maybe{v r8Q} (fun)
     249939    2016480          0    3 LMi                   GHC.Types.Var.Set.$wgo1{v r1x} (fun)
     249145          0          0    2 .M                    go1{v sb1d} (GHC.Types.Unique.FM) (fun) in rG
     246465     534384          0    1 M                     GHC.Core.Utils.exprType{v ry} (fun)
     246076        288          0    4 ppMc                  GHC.Utils.BufHandle.$wbPutChar{v rw} (fun)
     246032   26731248          0    2 SL                    $j{v sjaP} (GHC.Core.Opt.OccurAnal) (LNE) in rhQP
     239060    5177856          0    1 L                     GHC.Types.Unique.DFM.plusUDFM2{v r3I} (fun)
     237427          0          0    2 MM                    GHC.Core.TyCon.$fEqTyCon_$c=={v r5A} (fun)
     237345          0          0    0                       $j{v si4p} (GHC.Core.Opt.OccurAnal) (LNE) in r3FU
     237345          0          0    2 SM                    lookupDetails{v r3FU} (GHC.Core.Opt.OccurAnal) (fun)
     235428       6144          0    1 M                     GHC.Core.FVs.stableUnfoldingFVs{v r1f} (fun)
     234448   20631424          0    2 MM                    $w$j{v sqBf} (GHC.Core.Opt.Simplify.Utils) (LNE) in sqBe
     234112    9411800          0    1 M                     GHC.Cmm.Node.$fNonLocalCmmNode_$csuccessors{v r1O} (fun)
     234080      13312          0    1 i                     $j{v sBFg} (GHC.Core.Opt.Simplify) (LNE) in sBDC
     234074   29048880          0    8 MLMELLMM              GHC.Core.Opt.Simplify.Utils.$waddValArgTo{v r43} (fun)
     232605    3721680          0    3 >.S                   GHC.Types.Unique.FM.nonDetStrictFoldUFM{v rG} (fun)
     230930    6916824          0    3 MSL                   go{v s6v5} (GHC.Cmm.Dataflow.Graph) (LNE) in r1b
     230555          0          0    5 SMMMS                 GHC.Core.Unify.ruleMatchTyKiX4{v r3N} (fun)
     230513       2800          0    0                       $j{v scxt} (GHC.Core.Unify) (LNE) in r3N
     230240     345472          0    0                       $j1{v scxu} (GHC.Core.Unify) (LNE) in scxt
     230168     657432          0    1 M                     Data.IntMap.Internal.elems{v rp} (fun)
     229360   59752896          0    2 SM                    $wtagLamBinder{v rhOR} (GHC.Core.Opt.OccurAnal) (fun)
     228957   32484816          0    2 MM                    setBinderOcc{v r3G9} (GHC.Core.Opt.OccurAnal) (fun)
     225806          0          0    1 M                     go{v s5xE} (GHC.Cmm.Dataflow) (LNE) in r5rW
     224447   11295768          0    1 M                     go1{v saGA} (GHC.Types.Unique.FM) (fun) in rP
     222885    1293000          0    3 LMi                   GHC.Types.Unique.DFM.plusUDFM3{v r4C} (fun)
     222100   35484616          0    8 MSMSSSwM              GHC.Types.Id.Info.$wzapFragileInfo{v r4Q} (fun)
     219750    2966256          0    2 SM                    GHC.Core.Opt.Arity.arityType{v r2s} (fun)
     219028   49153824          0    7 SSSSSM>               GHC.Core.Opt.Simplify.Env.$wsubst_id_bndr{v r4V} (fun)
     219028          0          0    1 M                     GHC.Core.Opt.Simplify.Env.seqId{v r4D} (fun)
     217516    6801144          0    1 L                     go1{v s6vg} (GHC.Cmm.Dataflow.Graph) (fun) in s6v5
     217514          0          0    1 M                     GHC.Core.Opt.Simplify.Env.simplBinder2{v r4W} (fun)
     217514          0          0    6 SSSSSM                GHC.Core.Opt.Simplify.Env.$wsubstIdBndr{v r4C} (fun)
     217241          0          0    2 Li                    $wloop{v s3OT} (GHC.Utils.BufHandle) (LNE) in rC
     214941          0          0    1 M                     GHC.Types.Id.$wzapFragileIdInfo{v r3g} (fun)
     214941   13756224          0    0                       $j{v sjWv} (GHC.Core.Opt.Simplify.Env) (LNE) in r4V
     212109    2545584          0    4 S>.M                  GHC.Cmm.Expr.$fDefinerOfRegsLocalRegCmmReg_$cfoldRegsUsed{v r5O} (fun)
     211283    3680752          0    1 L                     go4{v smMv} (GHC.Core.Unfold) (fun) in smLF
     209940    3672544          0    2 LM                    Data.IntMap.Internal.toAscList1{v r98} (fun)
     208940   20058080          0    6 SMMSSM                GHC.Core.Opt.Simplify.simplExpr2{v rzbP} (fun)
     208935          0          0    2 MM                    GHC.Core.Opt.Simplify.Env.wrapJoinFloats{v rO} (fun)
     208770     362976          0    1 L                     $wgo5{v rhQE} (GHC.Core.Opt.OccurAnal) (fun)
     208473   23272848          0    2 LS                    $wgo4{v rhQC} (GHC.Core.Opt.OccurAnal) (fun)
     208359          0          0    1 L                     go9{v si8f} (GHC.Core.Opt.OccurAnal) (LNE) in r3Fx
     205872     595440          0    2 ML                    go6{v sr2I} (GHC.Core.Utils) (LNE) in r6
     200546     487920          0    1 M                     GHC.Types.Demand.argOneShots_go{v rac} (fun)
     200481    4834152          0    2 ML                    GHC.Data.OrdList.fromOL1{v r1W} (fun)
     200454          0          0    2 iM                    $wgo4{v rd5R} (GHC.Cmm.Sink) (fun)
     198246    2903136          0    2 >L                    GHC.Utils.Misc.strictMap{v r1F} (fun)
     197821          0          0    2 LS                    go3{v rdQe} (GHC.CmmToAsm.Reg.Liveness) (fun)
     195245          0          0    2 iM                    exit{v si62} (GHC.Core.Opt.OccurAnal) (LNE) in si4p
     194671          0          0    1 M                     go9{v si1J} (GHC.Core.Opt.OccurAnal) (LNE) in si0b
     186937    6500480          0    1 L                     GHC.Core.Opt.Simplify.Utils.argInfoAppArgs{v r7} (fun)
     185888       1536          0    5 ppMpi                 GHC.Utils.BufHandle.$wbPutPtrString{v rA} (fun)
     185460    9013888          0    4 MLMS                  go{v sngp} (GHC.Core.SimpleOpt) (LNE) in rY
     185124     264624          0    1 M                     Data.IntMap.Internal.toAscList{v r1Y} (fun)
     183736          0          0    2 MM                    GHC.Cmm.Expr.$fEqGlobalReg_$c=={v r69} (fun)
     182957    7318280          0    6 LM>SLS                $wlvl{v ralI} (GHC.Core.FVs) (fun)
     182957    4390968          0    4 T>ST                  lvl1{v ralJ} (GHC.Core.FVs) (fun)
     182674    8320664          0    4 LSSM                  go10{v sAaM} (GHC.Core.Opt.Simplify) (fun) in sAau
     180800          0          0    4 S>.M                  GHC.Cmm.Node.$fUserOfRegsLocalRegCmmNode_$cfoldRegsUsed{v r2f} (fun)
     180593          0          0    2 Li                    $wgo3{v s7PW} (GHC.Core.Stats) (LNE) in rO
     179044          0          0    2 Li                    $wgo4{v s7Qb} (GHC.Core.Stats) (LNE) in s7PW
     176502    6318216          0    2 LM                    GHC.Cmm.Dataflow.Label.$fIsMapLabelMap6{v r1o} (fun)
     175333    3418848          0    1 L                     GHC.Utils.Ppr.hcat_go1{v r3K} (fun)
     175333   12415568          0    1 L                     go1{v sfvg} (GHC.Utils.Outputable) (fun) in raq
     173343          0          0    1 M                     go6{v sk7V} (GHC.Core.Opt.DmdAnal) (LNE) in rjRC
     171313    1576032          0    2 MM                    GHC.Data.OrdList.appOL{v r1} (fun)
     171243          0          0    2 LS                    $j{v saLt} (GHC.Core.FVs) (LNE) in r1s
     170213          0          0    1 M                     go7{v saRP} (GHC.CmmToAsm.Reg.Linear) (LNE) in saRB
     170138    4083312          0    0                       fail{v scxy} (GHC.Core.Unify) (LNE) in scxu
     169010          0          0    4 MiMi                  GHC.Types.Unique.DFM.$wplusUDFM{v r4B} (fun)
     168994    4055856          0    0                       exit{v sisP} (GHC.Core.Opt.OccurAnal) (LNE) in sirF
     168867    1384920          0    3 >.M                   GHC.Cmm.Node.$w$cfoldRegsDefd{v r1h} (fun)
     167919          0          0    1 L                     go9{v siNH} (GHC.Core.Opt.OccurAnal) (LNE) in r1o
     167729          0          0    2 L.                    go1{v sc6F} (GHC.Cmm.Expr) (LNE) in r7k
     165472          0          0    0                       $j2{v scDA} (GHC.Core.Unify) (LNE) in scxy
     165265          0          0    3 SM.                   GHC.Types.Var.Env.extendVarEnv1{v r3c} (fun)
     164710          0          0    1 M                     GHC.Core.Type.isOneDataConTy{v r2D} (fun)
     163602    4552408          0    2 >L                    GHC.Utils.Misc.filterOut{v rx} (fun)
     163487    1506544          0    2 IM                    GHC.Core.Utils.exprIsDeadEnd_go{v r3J} (fun)
     160335          0          0    1 i                     $j1{v saRB} (GHC.CmmToAsm.Reg.Linear) (LNE) in saRp
     159795   14061960          0    2 SS                    GHC.Types.Demand.bothDmd{v rc} (fun)
     159038     996000          0    1 M                     GHC.CmmToAsm.X86.Instr.$fInstructionInstr_$cjumpDestsOfInstr{v r1y} (fun)
     157766          0          0    1 L                     $wgo3{v sljw} (GHC.Core.Opt.SetLevels) (fun) in rl4m
     157327   11327544          0    1 S                     GHC.Types.Demand.strictenDmd{v r1f} (fun)
     156296     326608          0    2 LL                    go1{v siSA} (GHC.Core.Rules) (LNE) in r8
     155675          0          0    1 M                     go26{v slaf} (GHC.Core.Opt.SetLevels) (LNE) in rl3M
     152726          0          0    3 M>>                   GHC.Core.Opt.ConstantFold.$m:++:{v rZ} (fun)
     152679          0          0    1 M                     isRealWorldExpr{v rlLV} (GHC.Core.Unfold) (fun)
     152157    4582176          0    2 LM                    GHC.Core.mkConApp1{v rfH} (fun)
     151568   10763272          0    5 SLSSM                 go9{v rzfa} (GHC.Core.Opt.Simplify) (fun)
     151568   27305568          0    2 LL                    go10{v sC2A} (GHC.Core.Opt.Simplify) (fun) in sBYZ
     149305          0          0    2 MM                    GHC.Core.Multiplicity.mkMultMul{v r3} (fun)
     148596    4567056          0    1 L                     GHC.Core.DataCon.dataConRepStrictness_go1{v ra4} (fun)
     147146    5904640          0    1 M                     GHC.Core.deAnnotate'{v ro} (fun)
     146121    4675872          0    0                       lvl{v s2PC} (GHC.Types.Unique.Supply) (fun) in r1k
     143160      87136          0    2 SM                    GHC.Core.TyCo.Subst.$wsubstTyVar{v r3w} (fun)
     143158          0          0    2 SM                    GHC.Core.TyCo.Subst.substTyVar{v rX} (fun)
     143092          0          0    3 M>>                   GHC.Core.Opt.ConstantFold.$m:**:{v rN} (fun)
     142283    5788864          0    2 LL                    merge{v s6oQ} (Data.OldList) (fun) in rE
     140509          0          0    1 M                     go9{v si3H} (GHC.Core.Opt.OccurAnal) (LNE) in rE
     140509          0          0    5 SSSiM                 GHC.Core.Opt.OccurAnal.$wdoZappingByUnique{v rE} (fun)
     140509          0          0    1 M                     $j1{v si4q} (GHC.Core.Opt.OccurAnal) (LNE) in si4p
     140321          0          0    0                       $j{v si0b} (GHC.Core.Opt.OccurAnal) (LNE) in rE
     139784    4560512          0    2 >M                    GHC.Cmm.Node.mapExp{v r7} (fun)
     139027          0          0    2 MM                    GHC.Cmm.CLabel.$fOrdCLabel_$ccompare{v r2p} (fun)
     138272          0          0    2 LM                    GHC.Cmm.Dataflow.Label.$fIsMapLabelMap4{v r1m} (fun)
     138272    5123232          0    1 L                     GHC.Cmm.Dataflow.Label.$fIsMapLabelMap2{v r1i} (fun)
     136866          0          0    1 .                     g1{v saGy} (GHC.Types.Unique.FM) (fun) in rP
     136850     651504          0    2 MM                    GHC.Types.Demand.bothArgUse{v rag} (fun)
     136407    1643648          0    2 .M                    go1{v s8Lo} (GHC.Cmm.Dataflow.Label) (fun) in r18
     135736          0          0    1 M                     go3{v sdcQ} (GHC.StgToCmm.Env) (LNE) in rV
     134928          0          0    2 M.                    go{v s5vQ} (GHC.Cmm.Dataflow) (fun) in r4
     134894          0          0    2 SM                    interesting{v r2Pk} (GHC.CmmToAsm.X86.Instr) (fun)
     134539    2177112          0    3 >M.                   GHC.Cmm.Node.wrapRecExpf{v re} (fun)
     134193          0          0    2 Mi                    GHC.Core.Utils.$wisCheapApp{v r1U} (fun)
     134193          0          0    2 MI                    GHC.Core.Utils.isCheapApp{v rE} (fun)
     132698    7431088          0    6 M>>SLS                GHC.Core.FVs.$waddBndr{v rW} (fun)
     132068    3169632          0    4 L>ST                  go13{v saIi} (GHC.Core.FVs) (fun) in ralI
     132068          0          0    3 >ST                   sat_saIC{v} (GHC.Core.FVs) (fun) in saIj
     131520      20904          0    1 M                     GHC.Core.Type.typeKind1{v rbX} (fun)
     131362    4340472          0    2 ML                    GHC.Cmm.Dataflow.Block.blockToList_go{v r26} (fun)
     131034          0          0    1 M                     GHC.Core.Rules.lookupRule_go6{v r29} (fun)
     131034          0          0    1 M                     GHC.Core.Rules.lookupRule1{v r26} (fun)

With no other contexts I will just use the number of inserts as baseline to compare the reports.

Thing that stood out to me:

In the original ticket we have 1-2 calls to seqList per insert call. In the ticky profile I created it's one call to seqList per ~ 200 inserts. Other seq calls are also rarer.
There is about half as many calls to elem which might be a result of the improved rules I mentioned. If the remaining calls are warranted or would be better served by map lookups is still unclear however.

A lot of other changes, but none that stand out all that much.

One more thing stood out. Looking at the generic ticky counters there are 7M unknown calls in my profile, and 2,2M calls to GHC.Base.map

call counters

          0 SLOW_CALL_fast_v16_ctr
     535190 SLOW_CALL_fast_v_ctr
          0 SLOW_CALL_fast_f_ctr
          0 SLOW_CALL_fast_d_ctr
          0 SLOW_CALL_fast_l_ctr
          0 SLOW_CALL_fast_n_ctr
    3832061 SLOW_CALL_fast_p_ctr
     516033 SLOW_CALL_fast_pv_ctr
    2941382 SLOW_CALL_fast_pp_ctr
     451434 SLOW_CALL_fast_ppv_ctr
     524897 SLOW_CALL_fast_ppp_ctr
      39530 SLOW_CALL_fast_pppv_ctr
     662559 SLOW_CALL_fast_pppp_ctr
       1686 SLOW_CALL_fast_ppppp_ctr
          9 SLOW_CALL_fast_pppppp_ctr
      61390 VERY_SLOW_CALL_ctr
    7094917 UNKNOWN_CALL_ctr
  112602468 KNOWN_CALL_ctr
     119693 KNOWN_CALL_TOO_FEW_ARGS_ctr
    2351561 KNOWN_CALL_EXTRA_ARGS_ctr

Each application of the mapping function should result in an unknown call. So 7M unknown calls almost seems a bit low with >2M calls to map. But I suppose it's possible.
Maybe it's worthwhile to look into allowing map to inline, so that the mapping function becomes a known call. (E.g. use a local recursive go function)

In the original ticket we have 1-2 calls to seqList per insert call. In the ticky profile I created it's one call to seqList per ~ 200 inserts. Other seq calls are also rarer.

That sounds like an improvement. But I've lost track of what you are comparing. GHC baseline vs (GHC baseline + splitAt patch)? Or what? How are we getting so many fewer calls to seqList?

Before inlining map (and thereby spreading its allocations around) I wonder if we could find out where all those calls to map are coming from, and how long the lists are. But I agree about your point about unknown calls.

That sounds like an improvement. But I've lost track of what you are comparing. GHC baseline vs (GHC baseline + splitAt patch)? Or what? How are we getting so many fewer calls to seqList?

Hard to say! I'm comparing my profile with the one from richard. Using the number of inserts into IntMap as a measure of work done by the compiler.
This is is a pretty bad measure. But without knowing what commit his profile was based on, what file ghc was compiling and which options were used to compile it I feel it's the best I could do.

All it really shows is that in some unknown case we call seqList a lot. In another case (which I found historically to be pretty representative) we don't. So before we spend time to get rid of these, we should first establish that they are a problem in the common case.

Before inlining map (and thereby spreading its allocations around) I wonder if we could find out where all those calls to map are coming from, and how long the lists are. But I agree about your point about unknown calls.

For what it's worth I tried rewriting map with local recursion, and while we got fewer unknown calls it wasn't clear if it's overall an improvement. So likely no low hanging fruit there. We should definitely try to find out where these calls (and reverse, ++) come from. Maybe there are a few hotspots which can be refactored to make them better.

OK so the comparison isn't interesting. We should just look at HEAD.

All it really shows is that in some unknown case we call seqList a lot.

Indeed. Which takes us back to: who (ie what call sites) exactly is calling seqList so much? We really need tool support for this question.

I think we should use lists much less. They are always the wrong data structure and only really useful when fused as a control structure. In the latter case, we could arguably should have newtype FB a = { foldr_ :: forall r. (a -> r -> r) -> r -> r } rather than all these rewrite rules which effectively rewrite to this type and back. To make matters worse, the compiler often can't decide whether to share such a list (in which case fusion fails) or not.

If we only had FB, then we just needed a conversion function fuse :: Seq a -> FB a to enable explicit fusion. And probably a better, more cache-friendly version of Seq based on RRB trees. Bonus: No need for Bag, OrdList, etc. RRB trees should be vastly superior in all use cases. Also no messing about with seqList.

I see that refactoring all uses of lists within GHC is daunting. But we could do this incrementally!

If there are hints of missed specialization, it may be worth it to compile with -fexpose-all-unfoldings -fspecialise-aggressively -fsimpl-tick-factor=200 -Wall-missed-specialisations and compare the results. These options are likely to improve performance of any code written polymorphically with constraints in place of monad transformer stacks, which is not the case here, but it doesn't hurt to check just in case.

mentioned in issue #18566 (closed)

For what it's worth, the "bottom-up" profiling mechanism implemented in !3871 (closed) is quite handy for tracking down things like mysterious (++) and reverse uses. Concretely, after building that branch with

diff --git a/hadrian/src/Settings/Flavours/Profiled.hs b/hadrian/src/Settings/Flavours/Profiled.hs
index 49339f1446..e2a46a5b9e 100644
--- a/hadrian/src/Settings/Flavours/Profiled.hs
+++ b/hadrian/src/Settings/Flavours/Profiled.hs
@@ -18,5 +18,10 @@ profiledArgs = sourceArgs SourceArgs
         [ pure ["-O0", "-H64m"]
         ]
     , hsLibrary  = notStage0 ? arg "-O"
-    , hsCompiler = mconcat [stage0 ? arg "-O2", notStage0 ? arg "-O"]
+    , hsCompiler = mconcat [stage0 ? arg "-O2", notStage0 ? arg "-O",
+                           notStage0 ? mconcat
+                            [ arg "-fprof-caller=*.map"
+                            , arg "-fprof-caller=*.reverse"
+                            ]
+                           ]
     , hsGhc      = arg "-O" }

One gets a profile containing cost-centers for all call-sites of map and reverse. One can then easily extract a sorted list of cost-centers with bgamari/bottom-up-analysis>:

$ cabal new-run --allow-newer=base bottom-up-analysis -- ../ghc.prof -f reverse  | less

┌────────────┬──────────┬────────────────────────────────┬────────────────────────────────┬───────────────────────────────────────────────┐
│  entries   │   CCID   │             module             │             label              │                    src loc                    │
╞════════════╪══════════╪════════════════════════════════╪════════════════════════════════╪═══════════════════════════════════════════════╡
│ 3260046    │ 214      │ GHC.Core                       │ $wcollectArgsTicks.go(calling… │ compiler/GHC/Core.hs:2210:5-6                 │
│ 3260046    │ 2006     │ GHC.Core.Opt.OccurAnal         │ $woccAnal.exit_Xb(calling rev… │ <no location info>                            │
│ 1959473    │ 221      │ GHC.Core                       │ $wpoly_go(calling reverse)     │ <no location info>                            │
│ 1440234    │ 221      │ GHC.Core                       │ $wpoly_go(calling reverse)     │ <no location info>                            │
│ 1330559    │ 218      │ GHC.Core                       │ $wgo(calling reverse)          │ compiler/GHC/Core.hs:2168:5-6                 │
│ 1016877    │ 2059     │ GHC.Core.Opt.Simplify.Utils    │ $wgo(calling reverse)          │ compiler/GHC/Core/Opt/Simplify/Utils.hs:496:… │
│ 978420     │ 216      │ GHC.Core                       │ $wgo(calling reverse)          │ compiler/GHC/Core.hs:2162:5-6                 │
│ 890266     │ 214      │ GHC.Core                       │ $wcollectArgsTicks.go(calling… │ compiler/GHC/Core.hs:2210:5-6                 │
│ 890266     │ 2006     │ GHC.Core.Opt.OccurAnal         │ $woccAnal.exit_Xb(calling rev… │ <no location info>                            │
│ 881864     │ 3393     │ GHC.Core.TyCo.FVs              │ go(calling reverse)            │ compiler/GHC/Core/TyCo/FVs.hs:955:5-6         │
│ 515546     │ 214      │ GHC.Core                       │ $wcollectArgsTicks.go(calling… │ compiler/GHC/Core.hs:2210:5-6                 │
│ 515546     │ 2006     │ GHC.Core.Opt.OccurAnal         │ $woccAnal.exit_Xb(calling rev… │ <no location info>                            │
│ 407267     │ 221      │ GHC.Core                       │ $wpoly_go(calling reverse)     │ <no location info>                            │
│ 403387     │ 2409     │ GHC.Tc.Utils.TcType            │ $wsplit(calling reverse)       │ compiler/GHC/Tc/Utils/TcType.hs:1262:5-9      │
│ 382833     │ 2059     │ GHC.Core.Opt.Simplify.Utils    │ $wgo(calling reverse)          │ compiler/GHC/Core/Opt/Simplify/Utils.hs:496:… │
│ 352449     │ 212      │ GHC.Core                       │ $wcollectAnnArgsTicks.$wgo(ca… │ compiler/GHC/Core.hs:2317:5-6                 │
│ 350831     │ 221      │ GHC.Core                       │ $wpoly_go(calling reverse)     │ <no location info>                            │
│ 336185     │ 2198     │ GHC.Stg.Syntax                 │ $wstripStgTicksTop.go(calling… │ compiler/GHC/Stg/Syntax.hs:181:10-11          │
...

This looks useful but it would be good to also include allocations as well as entries, after all, that's what the ticky report is mainly about.

In the case of reverse (excerpted above) we see that the overwhelming majority of uses come from various occurrences of the "accumulator" pattern. For instance, in collectArgsTicks we have:

collectArgsTicks :: (Tickish Id -> Bool) -> Expr b
                 -> (Expr b, [Arg b], [Tickish Id])
collectArgsTicks skipTick expr
  = go expr [] []
  where
    go (App f a)  as ts = go f (a:as) ts
    go (Tick t e) as ts
      | skipTick t      = go e as (t:ts)
    go e          as ts = (e, as, reverse ts)

It's quite unclear to me whether this can be improved. DList is of course an option, but I suspect this will end up being a very similar to the accumulator pattern in its performance characteristics. A structure like Data.Sequence is of course another option, but this comes with far larger constant factors which I suspect won't work out well with the small lists that we are typically working with.

It might be interesting to try the "keep the list on the stack version". It would look like this

go (App f a) = let !(e', as, ts) = go f
               in (e', a:as, ts)
go (Tick t e) | skipTick t
              = let !(e', as, ts)= go e
                in (e', as, t:ts)
go e          = (e, [], [])

There's a lot of building a triple and taking it apart again, but I think that CPR analysis would nail that. So it might be faster. Perhaps worth profiling a micro-benchmark.

I think your version reverses as. Note that we want a right fold for as but a left fold for ts. Something like this:

go (App f a) as = go f (a:as)
go (Tick t e) as | skipTick t
              = let !(e', as, ts)= go e as
                in (e', as, t:ts)
go e         as = (e, as, [])

I agree that CPR could see this.

This is a good point; in this case the model of collectArgsTicks in my benchmark is actually incorrect (or rather, the names in the source are slightly misleading).

One tricky consideration here is that we usually don't have any ticks in a typical program. Consequently, the reverse call will be very cheap indeed. This raises the question of whether this is the right place to be focusing optimisation effort.

I quickly hacked together a microbenchmark (bgamari/append-benchmark>) examining the performance of Sequence, DList, and reverse in the "list rebuild" basic usage found in the accumulator pattern above. The results are somewhat interesting:

for long lists (above 20 elements or so) Seq is slower than both DList and reverse. This is opposite of what my intuition expected.
for short lists (5 or fewer elements) Seq is slightly faster

I also studied a model of the collectArgsTicks case. I'm working on summarizing the results of this.

Also consider OrdList. In practice it performed the same as dlist when I measured it and it avoids use of yet another datastructure in ghc.

I would not expect Data.Sequence to make a particularly good list-building monoid. It's better built when you actually use it as a sequence. A catenable list with fat nodes might work—something like Data.Sequence but filled with larger chunks of elements. You don't need splitting and such, I assume, so having chunks of varying size is okay.

A catenable list with fat nodes might work—something like Data.Sequence but filled with larger chunks of elements.

Yes, RRB trees are basically 31-32 trees. Or a relaxation of Array-mapped Tries with an extension that allows O(lg n) concat. Still not sure about the constants, though. But there seem to be some efficient implementations (keep an eye out for paguro.RrbTree and bifurcan.List) for the JVM.

Edit: Also note that they admit efficient transient (as opposed to persistent) semantics that we could safely expose through an ST-based or even linear type-based approach. Ed Kmett wrote an unfinished implementation a while ago.

mentioned in issue #18993

Entries	Alloc	Alloc'd	Non-void	Arguments	Name
882132	37,662,656	0	3	i.M	containers-0.6.2.1:Data.IntMap.Internal.$winsert{v rAxS} (
457987	4,142,400	0	2	iM	containers-0.6.2.1:Data.IntMap.Internal.$wlookup{v rAxo} (
401502	18,270,560	0	2	>L	base:GHC.Base.map{v 01X} (fun)
321081	0	0	2	Li	base:GHC.List.$wlenAcc{v r4jQ} (fun)
318167	4,218,520	0	2	iM	containers-0.6.2.1:Data.IntMap.Internal.$wdelete{v rAA7} (
289716	6,013,168	274400	1	M	go{v sbLg} (ghc:GHC.Core.TyCo.Subst) (fun) in r1IP
268765	1,479,648	0	2	ML	ghc:GHC.Core.TyCon.expandSynTyCon_maybe{v r4Ls} (fun)
221796	5,240,200	0	2	LL	base:GHC.Base.++{v 03} (fun)
212527	3,147,768	0	2	LL	base:GHC.List.reverse1{v r4jZ} (fun)
183321	0	0	2	L.	ghc:Util.seqList{v r1O6} (fun)
162516	0	0	1	L	ghc:GHC.Core.Type.seqTypes{v r3um} (fun)
159092	0	0	1	M	ghc:GHC.Core.Type.seqType{v r3ul} (fun)
157883	0	0	2	SM	ghc:GHC.Types.Var.Env.lookupVarEnv{v r1As} (fun)
141645	0	0	2	MS	ghc:GHC.Types.Var.Set.elemVarSet{v r1n2} (fun)
138562	336,384	0	3	MiM	ghc:GHC.Types.Module.$wpoly_go1{v rjEr} (fun)
138506	0	0	2	iM	containers-0.6.2.1:Data.IntSet.Internal.$wmember{v rhS3} (
120471	0	0	1	+	ghc-prim:GHC.Classes.=={v 02H} (fun)
116857	0	0	3	SMS	ghc:GHC.Core.TyCo.FVs.deepTcvFolder4{v r7fN} (fun)
113934	0	0	3	+.L	base:GHC.List.elem{v raa} (fun)
96460	0	0	3	SLS	ghc:GHC.Core.TyCo.FVs.tyCoVarsOfTypes1{v r7g7} (fun)
96378	4,124,992	0	3	i.M	containers-0.6.2.1:Data.IntMap.Strict.Internal.$winsert{v

Low-hanging(?) optimization fruit

Child items 0

Activity

Is IntMap the best map for our usecase.

(==)

Lists

Strictness