Performance metrics cause too many merge failures
Over the last week we have had numerous @marge-bot jobs fail due to perf metrics failures. What's more, the changes in question are improvements! Surely our CI system shouldn't be punishing us for improving performance.
This happens due to improvements from multiple MRs, which individually fall inside the acceptance threshold, accumulate in a batch to result in an overall change that exceeds the threshold.
After a discussion on ghc-devs we have come to the following conclusion:
- @marge-bot jobs should accept performance improvements.
- Possibly further widen windows of known-problematic tests.