
In the aftermath of the Space Shuttle Challenger disaster of 1986, a Presidential Commission was established to determine what went wrong. The most unusual member of the panel was almost certainly the physicist Richard Feynman, some of who’s books I have reviewed. Ultimately, his contribution proved to be controversial and was shifted into an annex of the official report. To me, it seems like a remarkably clear-sighted piece of analysis, with wide-ranging importance for complex organizations in which important things might go wrong.
The full text is available online: Appendix F – Personal observations on the reliability of the Shuttle
He makes some important points about dealing with models and statistics, as well as about the bureaucratic pressures that exist in large organizations. For instance, he repeatedly points out how the fact that something didn’t fail last time isn’t necessarily good evidence that it won’t fail again. Specifically, he points this out with reference to the eroded O-ring that was determined to be the cause of the fatal accident:
But erosion and blow-by are not what the design expected. They are warnings that something is wrong. The equipment is not operating as expected, and therefore there is a danger that it can operate with even wider deviations in this unexpected and not thoroughly understood way. The fact that this danger did not lead to a catastrophe before is no guarantee that it will not the next time, unless it is completely understood. When playing Russian roulette the fact that the first shot got off safely is little comfort for the next. The origin and consequences of the erosion and blow-by were not understood. They did not occur equally on all flights and all joints; sometimes more, and sometimes less. Why not sometime, when whatever conditions determined it were right, still more leading to catastrophe?
In his overall analysis, Feynman certainly doesn’t pull his punches, saying:
Since 1 part in 100,000 would imply that one could put a Shuttle up each day for 300 years expecting to lose only one, we could properly ask “What is the cause of management’s fantastic faith in the machinery?”
and:
It would appear that, for whatever purpose, be it for internal or external consumption, the management of NASA exaggerates the reliability of its product, to the point of fantasy.
It certainly seems plausible that similar exaggerations have been made by the managers in charge of other complex systems, on the basis of similar dubious analysis.
Feynman also singles out one thing NASA was doing especially well – computer hardware and software design and testing – to highlight the differences between a cautious approach where objectives are set within capabilities and a reckless one where capabilities are stretched to try to reach over-ambitious cost or time goals.
Of course, the fact that the Space Shuttle was more dangerous than advertised doesn’t mean it wasn’t worth the risk to launch them. Surely, astronauts were especially well equipped to understand and accept the risks they were facing. Still, if NASA had had a few people like Feyman in positions of influence in the organization, the Shuttle and the program surrounding it would probably have included fewer major risks.