T12545 is sensitive to `-dunique-increment` changes

mentioned in commit 6434399b

mentioned in merge request !4902 (closed)

It appears that the previous rebase of !4902 (closed) in https://gitlab.haskell.org/ghc/ghc/-/jobs/589469 actually proves me right: There we have a 1% decrease in T12545, because some previous job probably accepted an increase. Here's the history of that test: hsyl20.fr:4222/chart/x86_64-linux-deb9-hadrian. It's ever flip-flopping.

mentioned in commit 591f45b5

mentioned in merge request !5090 (closed)

mentioned in commit 8f81511b

mentioned in commit 1fb35198

mentioned in commit 7611f868

mentioned in merge request !5061 (closed)

mentioned in commit d9f6c159

mentioned in commit 9bbcc0b2

added Pnormal Tbug testsuite labels and removed needs triage label

mentioned in commit 0c60f420

mentioned in commit 69f0b1df

mentioned in commit ff75165c

closed with commit 856929a5

mentioned in commit 856929a5

mentioned in commit c3ff35bb

marked this issue as related to #19801

mentioned in issue #19801

mentioned in issue #19820

mentioned in merge request !5820 (closed)

See also #19820 (comment 351497), !5068 (closed) and most recently !5820 (comment 358820). Re-opening this issue for the 4.1% increase in the latter MR, because it means we might want to re-adjust the threshold once more.

reopened

mentioned in commit 5b059106

mentioned in merge request !5814 (closed)

In !5814 (comment 361852), T12545 increased by 3.6% between a rebase. I wrote a script, T12545.measure.sh, that is quite helpful in diagnosing wheter T12545 actually regresses, so I don't think we need to increase the acceptance threshold to 5% (I measured a spread of 4.8%) just yet; after all, it might hide some real regressions.

But maybe it would be helpful to actually make T12545.measure.sh the real T12545, e.g. run T12545 multiple times with different -dunique-increments.

mentioned in commit 2f6a414c

mentioned in commit 43bbf4b2

Do we understand why this one test is sensitive to dunique-increments changes? It's very discouraging to patch authors if their patch often triggers regressions they can't understand in T12545. We could just remove it from the test suite I suppose.

My working theory is that we are measuring unbalanced IntMaps: #19820 (comment 351497)

This issue hit again in https://gitlab.haskell.org/ghc/ghc/-/jobs/720979. An increase of 4.2%, although the spread increases much less:

old: [1692574448, 1774732040]
new: [1707307592, 1789728720]

so really it increases by less than 1%. Sigh. Maybe we want to increase the acceptance threshold to 5% after all if it regresses this frequently now.

mentioned in issue #8354

mentioned in merge request !5862 (closed)

I did a bit of ticky profiling while I had a ticky validate build on my hands today. Here's the result of comparing min.ticky against max.ticky for HEAD. E.g., I fixed the compiler binary, varying only -dunique-increment and storing ticky profiles. min.ticky is from the run with the least number of bytes allocated, max.ticky with the max number of bytes. Here are the results of compare-ticks min.ticky max.ticky:

| Change      | alloc A   | alloc B   | name                                                       |
|-------------|-----------|-----------|------------------------------------------------------------|
| +82091240.0 | 356509752 | 438600992 | Data.IntMap.Internal.$winsert (:<no module>)               |
| +11800.0    | 274560    | 286360    | Data.IntMap.Internal.$wdelete (:<no module>)               |
| +7944.0     | 471752    | 479696    | Data.IntMap.Internal.union1 (:<no module>)                 |
| +1560.0     | 744016    | 745576    | Data.IntMap.Internal.$winsertWithKey (:<no module>)        |
| +320.0      | 401200    | 401520    | poly_merge1 (:Data.IntMap.Internal)                        |
| +312.0      | 108072    | 108384    | merge1 (:Data.IntMap.Internal)                             |
| +128.0      | 10320     | 10448     | merge0 (:Data.IntMap.Internal)                             |
| +32.0       | 2112      | 2144      | sat_swur (:Data.IntMap.Internal)                           |
| -16.0       | 109976    | 109960    | Data.IntMap.Internal.difference1 (:<no module>)            |
| -32.0       | 496       | 464       | sat_swt1 (:Data.IntMap.Internal)                           |
| -64.0       | 608       | 544       | GHC.Weak.runFinalizerBatch1 (:<no module>)                 |
| -96.0       | 11040     | 10944     | GHC.IO.Handle.Internals.hClose_help1 (:<no module>)        |
| -248.0      | 258768    | 258520    | go2 (:Data.IntMap.Internal)                                |
| -960.0      | 771776    | 770816    | Data.IntMap.Strict.Internal.$winsertWithKey (:<no module>) |
| -1720.0     | 2240776   | 2239056   | Data.IntMap.Strict.Internal.$winsert (:<no module>)        |
| -3480.0     | 353120    | 349640    | poly_merge0 (:Data.IntMap.Internal)                        |
| -4640.0     | 1750544   | 1745904   | Data.IntSet.Internal.$winsertBM (:<no module>)             |
| -11920.0    | 2908328   | 2896408   | Data.IntMap.Strict.Internal.$walter (:<no module>)         |

| alloc A | name |
|---------|------|

| alloc B | name |
|---------|------|

I have a hunch that depending on the particular Unique distribution, we choose the lazy or the strict version of $winsert/$winsertWithKey more often. But the difference is much larger than just a simple shift from strict to lazy would suggest (first column is number of entries):

@@ -4189,18 +4192,18 @@
           0          0        224   0                      sat_sbJb{v} (GHC.Cmm.Node) (con: GHC.Types.I#{(w) d 6i}) in r3A0
        3571     146752          0   2 >M                   GHC.Cmm.Node.mapSuccessors{v r3A0} (fun)
        2041          0      40784   1 S                    sat_s74d{v} (GHC.Cmm.ContFlowOpt) (fun) in s73Q
-      55138    2240776          0   3 i.M                  Data.IntMap.Strict.Internal.$winsert{v r4ZN} (fun)
+      55095    2239056          0   3 i.M                  Data.IntMap.Strict.Internal.$winsert{v r4ZN} (fun)
           0          0      81568   0                      sat_s75A{v} (GHC.Cmm.ContFlowOpt) (con: GHC.Cmm.Dataflow.Block.BlockCC{d r40}) in s75B
        2549      81568     122352   0                      sat_s75B{v} (GHC.Cmm.ContFlowOpt) (thk) in s73v
        2542          0     101960   0                      sat_s75W{v} (GHC.Cmm.ContFlowOpt) (thk) in s73v
@@ -19069,10 +19069,10 @@
           0          0       3552   0                      lvl2263{v sLPB} (GHC.Parser.Lexer) (con: GHC.Types.SrcLoc.L{d r1eg}) in rn5y
           0          0       3552   0                      lvl2262{v sLPA} (GHC.Parser.Lexer) (con: GHC.Types.SrcLoc.L{d r1eg}) in rn5y
           0          0       3552   0                      lvl2261{v sLPz} (GHC.Parser.Lexer) (con: GHC.Types.SrcLoc.L{d r1eg}) in rn5y
-          0          0     115536   0                      sat_swdI{v} (Data.IntMap.Internal) (con: Data.IntMap.Internal.Tip{d r2qt}) in reVb
-          0          0   30615864   0                      sat_swdA{v} (Data.IntMap.Internal) (con: Data.IntMap.Internal.Tip{d r2qt}) in reVb
-          0          0   46608576   0                      sat_swe4{v} (Data.IntMap.Internal) (con: Data.IntMap.Internal.Tip{d r2qt}) in reVb
-    7298570  356509752          0   3 i.M                  Data.IntMap.Internal.$winsert{v reVb} (fun)
+          0          0    4151472   0                      sat_swdI{v} (Data.IntMap.Internal) (con: Data.IntMap.Internal.Tip{d r2qt}) in reVb
+          0          0   15826320   0                      sat_swdA{v} (Data.IntMap.Internal) (con: Data.IntMap.Internal.Tip{d r2qt}) in reVb
+          0          0   46578024   0                      sat_swe4{v} (Data.IntMap.Internal) (con: Data.IntMap.Internal.Tip{d r2qt}) in reVb
+    9350851  438600992          0   3 i.M                  Data.IntMap.Internal.$winsert{v reVb} (fun)
          54          0       1296   0                      sat_sKpL{v} (GHC.Parser.Lexer) (thk) in rmIh
           0          0       1296   0                      sat_sKpM{v} (GHC.Parser.Lexer) (con: GHC.Tuple.(,){(w) d 76}) in rmIh
          54       3888          0   1 T                    lvl517{v rmIh} (GHC.Parser.Lexer) (fun)

We probably need Ben's bottom-up profiling technique from here-on... I wasn't able to pin it on one particular call site.

FTR, this is the adjusted version of T12545.measure.sh I used to produce the min and max ticky reports. You can probably use it to generate min and max profiling reports instead:

#!/usr/bin/env sh

# https://stackoverflow.com/a/4774063/388010
TOP="$( cd -- "$(dirname "$0")/../" >/dev/null 2>&1 ; pwd -P )"
GHC=${GHC:-$TOP/_validate/stage1/bin/ghc}

echo "Using GHC=$GHC. Feel free to override via env var"

function measure() {
  rm T12545*.hi; $GHC -fforce-recomp -v0 -dunique-increment=$1 T12545.hs +RTS -t -rT12545.ticky 2>&1 | tail -n1 | cut -f1 -d',' | grep -o -P '\d+'
}

min=999999999999
max=-999999999999
while true; do
  inc=$((1 + $RANDOM % 1000000))
  n=$(measure $inc)
  any_change=false
  if [ $n -lt $min ]; then
    min=$n
    any_change=true
    mv T12545.ticky min.ticky
    echo "New min: $min (on $inc)"
  fi
  if [ $n -gt $max ]; then
    max=$n
    any_change=true
    mv T12545.ticky max.ticky
    echo "New max: $max (on $inc)"
  fi
  if [ "$any_change" = true ]; then
    echo "New ratio: $(($max*1000/$min - 1000)) per mille"
  fi
done

I have a hunch that depending on the particular Unique distribution, we choose the lazy or the strict version of $winsert/$winsertWithKey more often

How do we "choose the lazy or strict version". Isn't the lazy/strict choice baked into the source code?

Yes, but we "choose" a code path based on the generated Uniques. Maybe some union operation somewhere has to merge unbalanced trees? Or we lazily insert into an unbalanced tree somewhere? Something like that.

I tried to replace the lazy insert in addToUFM by a strict one, but then we run into infinite loops, presumably due to some knot-tying in codegen. Sigh.

It seems we need a profiling build to track down what calls lazy insert so much.

FYI I narrowed down the functions which require strictness in FM.hs to only 3 functions and built a branch which used strict maps.

relevant comments:

@sgraf812 I am not sure I follow either, why would we pick a different code path depending on which uniques were generated. The type of the map doesn't depend on the value of the uniques at all?

Are you just wondering why Data.IntMap.Internal.$winsert is called more often in one profile than another?

Are you just wondering why Data.IntMap.Internal.$winsert is called more often in one profile than another?

Yes. And my working hypothesis is that that's because the IntMaps in which we insert are unbalanced in one -dunique-increment configuration (where we measure the maximum) but balanced in another (where we measure the minimum). We need more more recursive insert calls if we insert into an unbalanced tree in the worst case and that allocates more.

See #19820 (comment 351497) for the unbalanced worst case.

mentioned in commit 8eaa5a95

mentioned in commit 7f5c0efe

mentioned in commit 4d2bbd67

mentioned in commit c92f45ce

mentioned in commit 74863638

mentioned in merge request !5068 (closed)

mentioned in issue #20222 (closed)

The default is an increment of 1. Your script tries many different “random large” increments. I expect that “sequential ids” behaves quite different than “large increment”, while two differently randomly large increments probably behave similar.

IntMaps should be able to handle sequential uniques quite well. In a way, that’s the optimal case. Do you only see worse value with non-sequential uniques, or also improvements?

Maybe with “bad” increments, we are getting collisions, and uniqueAway has to try more? But that would explain more calls to lookup, but not to insert.

Do you only see worse value with non-sequential uniques, or also improvements?

I see regressions and improvements. See testsuite/tests/perf/compiler/T12545.measure.sh which is a script that varies -dunique-increment randomly and echos the current min and max candidates. You can also use it with GHC=$(which ghc) ./T12545.measure.sh, I guess.

Last time I tried, I measured a spread of 4.8%. Neither min nor max was achieved with an increment of 1. And if I do just a tiny, unrelated change and recompile GHC, I can get completely different perf for increment 1. Min and max stay about the same, hence the need for this script to assure that a ghc/alloc regression in T12545 is not actually a regression.

Ok, thanks for the clarifications. Reasonably mysterious…

T12545 is sensitive to `-dunique-increment` changes

Child items ...

Activity

T12545 is sensitive to `-dunique-increment` changes

Relates to

Activity