Making performance test failures less severe
Currently we always treat performance test failures in CI as fatal. However, this has been difficult manage since environment tends to impact performance tests. For instance, the new Darwin builders seem to produce significantly higher (e.g. +5%) allocations figures than the old builders. This may be due to path dependence (since we various tools are located in the nix store in the new environment) or perhaps OS version (since the new boxen run Big Sur, see #19025 (closed)).
One way to avoid this is to focus our performance testing on x86-64/Linux, making perf test failures in all other environments non-fatal (e.g. see https://gitlab.haskell.org/ghc/ghc/-/jobs/589093). This would ease the contributor workflow (since you wouldn't need to accumulate the list of test deltas to accept from multiple jobs) and make it easier to run perf tests in a controlled environment. On the other hand, in doing so we run the risk of missing platform dependent performance shifts.