Tuesday, July 23, 2024

Handle With Care

There's a great song by the rock-n-roll supergroup The Traveling Wilburys called Handle With Care.  I remember when "Handle With Care" was stamped across cardboard shipping boxes to alert everyone that the contents within the box were fragile ("It must be Italian!").    I've been thinking a lot about the concept of "fragility" lately, particularly in the context of systems thinking.  The writer Nassim Taleb wrote a book called Antifragile, in which he defines "antifragility" as a property of systems that increase their capability to thrive as a result of stressors, shocks, mistakes, failures, or disorder.  Importantly, Taleb distinguishes "antifragility" from robustness (the ability to resist failure) and resilience (the ability to recover from failure).  I've not read any of Taleb's books (see my post "Anti-Library" for an explanation), and while I don't think he explicitly defines "fragility" as a concept of systems, I suppose that he implicitly would define a fragile system as one that performs worse as a result of stressors, shocks, mistakes, or failures.

With that in mind, I would describe our nation's commercial aviation infrastructure as "fragile". That is particularly painful for me to say, as commercial aviation is one of the prototypical High Reliability Organizations, at least when it comes to aviation safety.  Consider this.  Literally millions of computers running Microsoft's Windows operating system crashed this past Friday (July 19th) when the cybersecurity company CrowdStrike updated its software, causing widespread disruptions in airlines, banks, hospitals, and hotels.  For the next four days, thousands of commercial flights into, within, or out of the United States were canceled.  Those flights that weren't canceled experienced significant delays.

I have firsthand knowledge of these issues, as our family's flight from Seattle to Chicago on Saturday was canceled at the last minute (we were literally getting ready to board).  The airline representative told us that we would likely not be able to get on another flight until Wednesday this week.  Thankfully, we were able to get the last seats on a flight to St. Louis (thank you Alaska Airlines!), at which point we rented a car and drove to Chicago.  Our bags are still somewhere in Seattle, and who knows when we will see them again.  The Seattle airport was an absolute mess!

Hopefully we will learn from this incident.  As I reflected this weekend, it seems that flights are canceled more frequently now than they were in the past.  Even The Traveling Wilburys know about flight cancellations and delays (it says so right in their song Handle With Care - "Been stuck in airports..." they say).  However, when I reviewed the data, the number of cancellations have actually decreased over the past twenty years.  It may be that there are just more flights now than in the past, or maybe I fly more frequently now than I did in the past.  Who knows?  But as I think about this issue from a systems perspective, one thing is clear.  As the different subcomponents in a system become more tightly coupled, they become less resilient and more fragile.  There's no question that today's commercial aviation industry is highly complex, interconnected, and tightly coupled.  Under these conditions, the safety researcher Charles Perrow would suggest that accidents and disruptions are not only more common, they are inevitable - in essence, they are normal (see my post, "The Razor's Edge" for an explanation of Perrow's Normal Accident Theory).  

Fragility is an interesting concept and relatively easy to understand.  From a systems perspective, however, it is far more difficult to address.  The difficulties that our family (and many, many others) personally encountered in the aftermath of the CrowdStrike network outage is just the latest example.

No comments:

Post a Comment