GitLab disk space outage?
Yesterday on IRC, Ben mentioned that we ran out of disk space. To be more systematic in handling incidents, I'm opening this ticket with a few questions I hope Ben can quickly answer. Having these written down somewhere will help us deal with similar incidents faster in the future and help spread the burden of knowledge. (knowledge → power → responsibility).
The Outage
- What ran out of disk space? (gitlab.haskell.org? Which partition?)
- What symptoms did this cause?
- What factors contributed to the outage?
Response
- How did Ben become aware of the problem?
- What mitigation was used?
Future
- What things could be done to reduce the occurrence of disk space outages? (Big things! Little things!)
Optional bonus question: We can't achieve a rate of 0 incidents per unit time, and since getting ever closer to 0 takes ever more effort. What incident rate would be a good goal?