Skip to content

Remove references to "root cause(s)", as incidents should focus on all "contributing factors", including the trigger, that lead to the incident #95

@theckman

Description

@theckman

One change we are seeing in our industry is the wider adoption of the belief that being able to distill an incident down to a single root cause is a myth[1][2]. As the complexities of our systems grow the complexities of our incidents grow, and trying to isolate an incident to one item doesn't result in the types of learnings we need to come out of those incidents.

The truth is that each incident is unique because of the multiple factors that contributed to it, and if any one of those factors was different it would have been a completely different incident. Without giving each of those factors the same care, we miss the opportunity to solve for those different parts.

While pluralizing "root cause" to "root causes" can get you a good part of the way there, in my experience I've seen that the verbiage change from "root causes" to "contributing factors" is a much bigger change in how people think about it and drive the learnings in the way we want. While I initially was skeptical such a minimal language change would make a difference, I can happily admit I was wrong.

At Netflix we've started to change our internal language around it, and have found a much richer set of learnings from teams after an incident. Being that I was a responder at PagerDuty when we started to form these practices and the inception for this documentation, I feel like it'd be a miss if we didn't iterate on these documents to follow with learnings from our industry.

We, and others, have started to talk about Contributing Factors instead. We still identify what was traditionally called the "root cause", but we listed it as one of the factors (often called out as the trigger).

What are your thoughts on updating the verbiage of this documentation to align with our industry shifting its way of thinking?

[1] https://medium.com/@jpaulreed/dev-ops-and-determinism-966a57e3a5cc
[2] https://en.wikipedia.org/wiki/Fallacy_of_the_single_cause

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions