Episode 13
July 19, 2017
incident management
Sergio
Sergio returns to discuss a presentation about incident management that he gave at the Tokyo Rubist Meetup. (Apologies about the poor audio quality.)
Show Notes
- Slides from Sergio’s presentation ‘Do Not Panic!'
- Tokyo Rubyist Meetup
- Chapter 14 - Managing Incidents of Google’s excellent Site Reliability Engineering book.
- The Joel Test: 12 Steps to Better Code. Step 1 is “Do you use source control?”
- Scaling your API with rate limiters on the Stripe Blog
- “The number one priority if you loose communication with your team is to reestablish communication with your team.” - Joel Spolsky at around 9:00 on the excellent, excellent Episode 36 of the Stack Exchange Podcast ‘We Got Hit by a Hurricane’
- Reddit: Accidentally destroyed production database on first day of a job…
- Discussion of the above on Hacker News