Monday, April 12, 2021

The razor's edge

The actor and comedian (and noted Chicago Cubs fan) Bill Murray starred in his first dramatic role in the 1984 film The Razor's Edge, based on the 1944 novel of the same name by W. Somerset Maugham.  There's a famous scene from that movie, where Murray's character has joined a Buddhist monastery and starts to feel like he has finally found inner peace.  One of the Buddhist monks tells him, "The path to salvation is narrow and as difficult to walk as a razor's edge" (which is paraphrased from a verse in the Katha Upanishad.  I can't say that the movie is one of my favorites (because it's not), but I love the quote.

I will shamelessly paraphrase this great quote and suggest that the line separating order from chaos is as thin as a razor's edge.  I've written a number of posts in the past on high reliability organizations, organizations that have succeeded in avoiding catastrophic accidents in environments where these accidents would be otherwise expected to occur with high frequency.  There is a related concept that I've spent less time talking about, known as Normal Accident Theory.  "Normal accidents" (the term was coined by Charles Perrow in the early 1980's) occur in industries characterized by complex and tightly coupled systems.  In other words, these accidents are expected to occur - hence, they are "normal."  High reliability organizations appear to avoid these so-called "normal accidents," which is why I find them so interesting to study.

The phrase "tight coupling" deserves some discussion here as well.  Tightly coupled systems are mutually independent such that even a small error in one part of the system can easily compound and propagate to the point where the whole system is adversely impacted.  Tightly coupled systems exist on the imaginary boundary - that razor's edge, if you will - between order and chaos.  

"Tight coupling" is an important concept to understand, so let me illustrate with an example.  The commercial aviation industry pushes right up against that razor thin border between order and chaos all the time.  Just think about it - airlines regularly overbook flights with the expectation that a certain number of passengers will cancel or miss their scheduled flight (a full plane is a lot more profitable than a half empty one).  This kind of set-up usually works to the airlines' advantage, but every once in a while something happens that creates chaos throughout the system.  Aircraft and flight crew rotation schedules are set up in such a way where everything seems to work until it doesn't.  Again, one event can create chaos across the system.

Just this past Sunday, there was a major thunderstorm in the southeastern part of the United States, which caused widespread flight delays across the United States.  Over a hundred flights were canceled at Fort Myer's Southwest Florida International Airport (RSW), causing passengers to face delays as long as 14 hours or more in some cases (those were the lucky ones - many of the passengers were stranded - including my wife and I, who were on our way back home).   Even though the weather in Fort Myers was close to perfect, flights coming out of RSW were delayed because they were waiting on planes coming in from airports north of the thunderstorm.  In some cases, inbound aircraft coming from Atlanta and other cities were forced to wait on the tarmac for as long as five hours before they were cleared for take-off.  A few incoming flights were re-routed to other airports instead.  Unfortunately, the thunderstorm eventually moved into Southwest Florida, which only caused further delays.

Passengers inside the RSW terminal were stranded and left to wait, often with little communication from the airlines.  Eventually, a number of flights were either canceled or delayed until Monday morning, as a number of the crews "timed out" (they reached the FAA's regulation for duty hours).  Chaos ensued, and not just in the "friendly skies."  As more and more passengers became stranded, the facilities at RSW were unable to handle the excess load.  Restaurants and convenience stores ran out of food and drinks.  Rental cars, Uber and Lyft drivers, and even hotel accomodations grew more and more scarce as the day turned into night.

My own firsthand "eyewitness" impression was that things quickly got out of control.  Patience grew thin and tempers began to rage.  It wasn't pretty.  It was chaos.  Luckily, my wife and I were able to get on a flight the next morning.  However, not everyone was as lucky - a number of passengers were told that they would have to wait one or two additional days before they would be able to catch a flight out of Fort Myers.

The commercial aviation system is tightly coupled - one event in one part of the system can compound and propagate to adversely impact the entire system.  The imaginary border between order and chaos is truly as thin as a razor's edge.  

High reliability organizations seem to have figured this out.  They are by no means immune to these potentially catatrophic events.  However, they have learned to develop the kind of resilience that prevents these potentially catastrophic events from becoming catastrophic.  They do that through:


There are likely other important lessons from this example that also illustrate how High Reliability Organizations minimize the impact of these potentially catastrophic events.  Communication and transparency are clearly important.  The airlines could have saved themselves a lot of trouble by just being more open and honest with the passengers.  At one point, after waiting several hours to leave, passengers were just about ready to board a plane when they were told that the crew had "timed out."  I do agree with the need to restrict duty hours, but I do find it hard to believe that the airlines didn't know that the flight crews were approaching that duty hour limit.  Why tempt the passengers with the possibility of an actual departure?

High Reliability Organizations, similar to commercial aviation, push the boundary between order and chaos.  Where High Reliability Organizations are different, however, is that they continuously thrive at the boundary zone between order and chaos, and for this reason, they are not subject to Perrow's "normal accidents."

No comments:

Post a Comment