Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • GHC GHC
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 4,842
    • Issues 4,842
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 454
    • Merge requests 454
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Glasgow Haskell Compiler
  • GHCGHC
  • Issues
  • #17658

Closed
Open
Created Jan 10, 2020 by Sebastian Graf@sgraf812Developer

Baseline of performance tests should be the last metric in/decrease rather than HEAD^ to counter drift

The recent drama with performance test flukes (!2192 (comment 245745) and d46a72e1) started when we decreased the threshold of a couple of performance tests (!1983 (closed)) from 10% (mostly) to 1% (mostly). While the goal is honorable, I find it problematic for two reasons

  1. We get mentioned CI flukes, fueling a distrust in CI performance tests and at best leading to a lot of Metric Increase/Metric Decrease noise like d46a72e1, at worst to accepting performance regressions which were actually caused by a bug in the code.
  2. It doesn't counter performance drift, it just lessens the effects. Put plainly: 40 consecutive commits with a 0.5% increase will lead to a silent >20% drift regardless.

While weeping on the shoulder of @matheus23 today after telling him about my experience, he came up with a great idea: To counter drift, performance tests should use the commit with the last metric increase for a test as a baseline, rather than HEAD^ (ultimately defaulting to the last GHC release). This is much more effective, even if we pick a threshold that isn't susceptible to CI flukes (1.), like 5%. Even with a 10% threshold, the 20% drift scenario can be ruled out.

CC'ing @mpickering, @AndreasK, @bgamari, @alp, @osa1

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking