I made 20,000 API calls a minute and broke production for a day. Here's why the system was to blame, not me.
#1about 6 minutes
A personal story of breaking production at scale
The speaker recounts causing a major production outage by running a backfill script that overwhelmed the Facebook API and halted data updates.
#2about 2 minutes
Judging intentions versus actions during incidents
We tend to judge others by their actions but ourselves by our intentions, so we should assume good intent from colleagues during incidents.
#3about 2 minutes
Why individual blame is a counterproductive response
When a production issue occurs, it's a system failure, not an individual's fault, as responsibility is shared across developers, reviewers, and processes.
#4about 3 minutes
How to build a psychologically safe blameless culture
Shifting to a blameless culture requires fostering trust, understanding intentions, practicing self-awareness, and owning mistakes without displacing frustration.
#5about 2 minutes
Using blameless postmortems for system-level learning
Blameless postmortems, originating from aviation and healthcare, focus on investigating root causes to strengthen systems rather than assigning individual blame.
#6about 3 minutes
The power of positive feedback in code reviews
Applying the five-to-one ratio of positive to negative interactions can improve team dynamics, especially by adding positive comments during code reviews.
#7about 2 minutes
Using pre-mortems to proactively prevent failures
Pre-mortems are a proactive exercise where teams imagine a project has already failed in order to identify potential risks and edge cases beforehand.
#8about 3 minutes
Incident resolution and key cultural takeaways
The incident took 20 hours to fully resolve but was a valuable learning experience that exposed system flaws and reinforced a healthy team culture.
#9about 2 minutes
Q&A on customer impact and worst production breaks
The speaker answers audience questions about customer reactions to the outage and shares a story about his worst production break involving a failed form.
Related jobs
Jobs that call for the skills explored in this talk.
The Web We Broke (And Why AI Agents Are Paying the Price) - AgentCon BerlinThis is the accompanying post to the talk Chris Heilmann gave at AgentCon in Berlin on 19/05/2026, you can also see the slides and listen to it in this screencast:
Thirty years of developer shortcuts, bloated JavaScript, and inaccessible HTML have l...
Christina Schaireiter
Why Attend a Developer Event?Modern software engineering moves too fast for documentation alone. Attending a world-class event is about shifting from tactical execution to strategic leadership.
Skill Diversification: Break out of your specific tech stack to see how the industry...
Daniel Cranney
Dev Digest 214: Claude Is Leaking, GitHub Is Listening & Axios Hacked!Inside last week’s Dev Digest 214 .
🕵️ Claude source code leaked, analysed and re-written in 2 days
🐙 GitHub auto-opts users into feeding their code to train their AI
🌐 Pretext shows how to show complex text rendering in the browser
🤖 How to securin...
Daniel Cranney
Dev Digest 154: Responsible AI? Mistakes of CSS & track all the things!Inside last week’s Dev Digest 154 .
💰 Google pushing for AI on device with Web AI fund and lots of APIs
📱 Track your own location using in-app ads
🍎 Track your hardware using Apple’s location service
📈 Get insight into your network traffic
🤖 Respons...
From learning to earning
Jobs that call for the skills explored in this talk.