While debugging alleged regressions in !1866 (closed), !4163 (closed) and !4902 (closed), I ultimately came to the conclusion that a +13% increase in calls to IntMap.$winsert correlates with a change in -dunique-increment numbers.
Which makes the acceptance window of +-1% inappropriate for performance testing, as the changes in allocation were +-3% with the same compiler binary.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
It appears that the previous rebase of !4902 (closed) in https://gitlab.haskell.org/ghc/ghc/-/jobs/589469 actually proves me right: There we have a 1% decrease in T12545, because some previous job probably accepted an increase. Here's the history of that test: hsyl20.fr:4222/chart/x86_64-linux-deb9-hadrian. It's ever flip-flopping.
In !5814 (comment 361852), T12545 increased by 3.6% between a rebase. I wrote a script, T12545.measure.sh, that is quite helpful in diagnosing wheter T12545 actually regresses, so I don't think we need to increase the acceptance threshold to 5% (I measured a spread of 4.8%) just yet; after all, it might hide some real regressions.
But maybe it would be helpful to actually make T12545.measure.sh the real T12545, e.g. run T12545 multiple times with different -dunique-increments.
Do we understand why this one test is sensitive to dunique-increments changes? It's very discouraging to patch authors if their patch often triggers regressions they can't understand in T12545. We could just remove it from the test suite I suppose.
I did a bit of ticky profiling while I had a ticky validate build on my hands today. Here's the result of comparing min.ticky against max.ticky for HEAD. E.g., I fixed the compiler binary, varying only -dunique-increment and storing ticky profiles. min.ticky is from the run with the least number of bytes allocated, max.ticky with the max number of bytes. Here are the results of compare-ticks min.ticky max.ticky:
I have a hunch that depending on the particular Unique distribution, we choose the lazy or the strict version of $winsert/$winsertWithKey more often. But the difference is much larger than just a simple shift from strict to lazy would suggest (first column is number of entries):
We probably need Ben's bottom-up profiling technique from here-on... I wasn't able to pin it on one particular call site.
FTR, this is the adjusted version of T12545.measure.sh I used to produce the min and max ticky reports. You can probably use it to generate min and max profiling reports instead:
#!/usr/bin/env sh# https://stackoverflow.com/a/4774063/388010TOP="$(cd--"$(dirname"$0")/../">/dev/null 2>&1 ;pwd-P)"GHC=${GHC:-$TOP/_validate/stage1/bin/ghc}echo"Using GHC=$GHC. Feel free to override via env var"function measure(){rm T12545*.hi;$GHC-fforce-recomp-v0-dunique-increment=$1 T12545.hs +RTS -t-rT12545.ticky 2>&1 | tail-n1 | cut-f1-d',' | grep-o-P'\d+'}min=999999999999max=-999999999999while true;doinc=$((1+$RANDOM%1000000))n=$(measure $inc)any_change=falseif[$n-lt$min];thenmin=$nany_change=true mv T12545.ticky min.tickyecho"New min: $min (on $inc)"fi if[$n-gt$max];thenmax=$nany_change=true mv T12545.ticky max.tickyecho"New max: $max (on $inc)"fi if["$any_change"=true];thenecho"New ratio: $(($max*1000/$min-1000)) per mille"fidone
Yes, but we "choose" a code path based on the generated Uniques. Maybe some union operation somewhere has to merge unbalanced trees? Or we lazily insert into an unbalanced tree somewhere? Something like that.
I tried to replace the lazy insert in addToUFM by a strict one, but then we run into infinite loops, presumably due to some knot-tying in codegen. Sigh.
It seems we need a profiling build to track down what calls lazy insert so much.
@sgraf812 I am not sure I follow either, why would we pick a different code path depending on which uniques were generated. The type of the map doesn't depend on the value of the uniques at all?
Are you just wondering why Data.IntMap.Internal.$winsert is called more often in one profile than another?
Are you just wondering why Data.IntMap.Internal.$winsert is called more often in one profile than another?
Yes. And my working hypothesis is that that's because the IntMaps in which we insert are unbalanced in one -dunique-increment configuration (where we measure the maximum) but balanced in another (where we measure the minimum). We need more more recursive insert calls if we insert into an unbalanced tree in the worst case and that allocates more.
The default is an increment of 1. Your script tries many different “random large” increments. I expect that “sequential ids” behaves quite different than “large increment”, while two differently randomly large increments probably behave similar.
IntMaps should be able to handle sequential uniques quite well. In a way, that’s the optimal case. Do you only see worse value with non-sequential uniques, or also improvements?
Maybe with “bad” increments, we are getting collisions, and uniqueAway has to try more? But that would explain more calls to lookup, but not to insert.
Do you only see worse value with non-sequential uniques, or also improvements?
I see regressions and improvements. See testsuite/tests/perf/compiler/T12545.measure.sh which is a script that varies -dunique-increment randomly and echos the current min and max candidates. You can also use it with GHC=$(which ghc) ./T12545.measure.sh, I guess.
Last time I tried, I measured a spread of 4.8%. Neither min nor max was achieved with an increment of 1. And if I do just a tiny, unrelated change and recompile GHC, I can get completely different perf for increment 1. Min and max stay about the same, hence the need for this script to assure that a ghc/alloc regression in T12545 is not actually a regression.