Skip to content

GitLab disk space outage?

Yesterday on IRC, Ben mentioned that we ran out of disk space. To be more systematic in handling incidents, I'm opening this ticket with a few questions I hope Ben can quickly answer. Having these written down somewhere will help us deal with similar incidents faster in the future and help spread the burden of knowledge. (knowledge → power → responsibility).

The Outage

  1. What ran out of disk space? (gitlab.haskell.org? Which partition?)
  2. What symptoms did this cause?
  3. What factors contributed to the outage?

Response

  1. How did Ben become aware of the problem?
  2. What mitigation was used?

Future

  1. What things could be done to reduce the occurrence of disk space outages? (Big things! Little things!)

Optional bonus question: We can't achieve a rate of 0 incidents per unit time, and since getting ever closer to 0 takes ever more effort. What incident rate would be a good goal?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information