Rethink Choice of Baseline Commit for Performance Tests

Intro

Currently we always use the previous commit when running performance tests. This works well in CI where we fully test each commit in sequence (and hence always have test results for the previous commit). Remember, test results are stored in git notes and are not by default shared between repositories (i.e. your local repo will only have performance results when they were run locally on your machine). This is by design: we want to avoid comparing results form different machines.

Unfortunately This is not so effective when testing locally. The programmer may have only run a subset of performance tests on the previous commit, and often have not run the tests at all (this is notably true after performing a rebase: the previous commit has changed). We need to rethink how we pick a baseline commit.

Proposed Solution

Search for a baseline per test/test_env/metric/way
Start at HEAD^ and use local metrics. If none exist, use CI metrics. If none exist continue the search at the parent.
Stop after a constant number of commits (failing to find a baseline).
Stop the search if the child commit has expected changes for this test/metric/way (failing to find a baseline).
It's possible that there are multiple runs of the test (e.g if the test was run many times locally). In that case take the average.
If no baseline is found, show a warning and let the test pass.

Handling Expected changes

If there are expected changes between HEAD and a potential baseline commit, then that baseline cannot be used. We make no attempt to approximate a baseline. A warning will be issued telling the user to run the tests of the previous commit or try and fetch CI results.

Issues

The programmer is responsible for the final commit message having the correct expected changes. This is particularly important when merged via gitlab with a squash (this can change the commit message).
We do not distinguish between full/partial performance results being available for the baseline commit: that would require checking out the baseline commit and extracting the full list of tests (This seems fragile and far too expensive).

When to automatically fetch CI results?

Do not fetch CI results. Allow the user to do this, but give the exact git command so they can just copy and paste.

Alternative Solution

Ultimately this was deemed too complicated. It assumes that commits will be squashed and merged into master (not always true).

When running performance tests, results will be compared to a baseline commit that is the merge base with master (most recent commit from master). If HEAD is already in master, then the previous commit is used instead.
If any locally generated performance results exist, they are used exclusively for the baseline.
Else if any CI generated performance results exist (and have been fetched), they are used exclusively for the baseline.
Else performance tests trivially pass, and a warning is given to the user.

To find the baseline commit:

mergeBase = merge-base master HEAD
baselineCommit = if mergeBase == HEAD
             then HEAD^
             else mergeBase

Reasoning

We want each commit in master not to introduce a significant change in performance: hence we compare commits in mater to the previous commit.
If not on master (1 or more ahead and 0 or more commits behind master). We assume that the intention is to create a patch where all new commits will ultimately be squashed and placed on top of master as a single commit. On the other hand we don't want to consider changes in master from after we branched. Instead of using master HEAD as the baseline, we use the commit from which we branched from master (i.e. the merge base). In other words we are concerned only with the change in performance introduced by the newly crated commits.

Edited Mar 10, 2019 by davide

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information