Baseline of performance tests should be the last metric in/decrease rather than HEAD^ to counter drift

changed weight to 8

Ideally we would wan't both. A smallish threshold on individual commits, and a slightly larger one relative to the last metric change.

Having a way to prevent drift would allow us to be less restrictive on consecutive commits which, hopefully, results in less noise.

performance tests should use the commit with the last metric increase for a test as a baseline (ultimately defaulting to the last GHC release)

Oh, I think there is a big difference between these two:

Defaulting to the last release would mean we would still be susceptible to performance drift, just on a more granular level than on commits.

When comparing to the last metric increase/decrease we get rid of any possibilities of undocumented performance drift.

Looking at !1983 (closed) it seems this system could be a result of limits of current infrastructure:

The "new" performance testing infrastructure resets the baseline after every test so it's easy to miss gradual performance regressions over time.

mentioned in merge request !1869 (closed)

added perf tests label

mentioned in issue #18842

mentioned in issue #18839

marked this issue as related to #18842

mentioned in merge request !4283 (closed)

marked this issue as related to #18839

mentioned in commit a01f59af

mentioned in commit 795908dc

Child items ...