Professional What-If’ing

A quirk learned after years of professional “What-if”ing

Zach Wolfe
2 min readJan 3, 2022

Several years ago in a blog post by James Hamilton (Distinguished Engineer at AWS), I read a phrase that has really stuck with me:

“At scale, rare events aren’t rare.”

Consider operating a service that has a nice-sounding 99.99% success rate, it fails “rarely”, only 0.01% of the time. If the service is used millions times per day, it fails hundreds of times every single day. That isn’t rare at all, that’s more like once every ten minutes!

The fact that systems at scale are always failing is normal, and baked in to the design and implementation process. Software engineers on large scale systems build mechanisms to fail as gracefully as possible.

I’m Not “At Scale”

Working through the design and implementation of always-failing systems every day at work has somewhat entrenched this mantra of “rare” events not being rare at all in my mind. At work, the appropriate response when I discover a possible failure mode with 0.01% probability is to take some action against it: implement a failure handler, roll-back strategy, etc, because it will happen to real people every day.

However, in life, there are horrible things that can happen with a relatively low probability that don’t deserve much thought, and I found that sometimes it’s hard to separate the two situations. For a few years I found this type of ‘learned anxiety’ would get in my way at home where I expect adverse events that probably will never happen.

For example, the probability of any American dying in a car crash in 2019 was 1 in 8,393, which is also roughly 0.01%. Thinking back to the pretend service with the 0.01% failure rate used millions of times per day, it fails roughly once every ten minutes. Does that mean if I drive my car for more than ten minutes I should expect to die in that time? Of course not! But after working for years on software where the “rare” failure is not rare, I find myself driving a little slower anyway.

--

--

Zach Wolfe

Software Engineering, Identity, Machine Learning, Sustainability Technology