
Sometimes, in spite of our best efforts, things go wrong; equipment breaks, components crack, machinery grinds to a halt. If your asset fails to meet expectations or fails altogether, you need to fix it and make sure it doesn’t happen again. To do this, you need an explanation of what went wrong.
There are a wide variety of methods and procedures for analyzing the causes of failure, including the widely used Fishbone or Ishikawa diagram, the versatile Fault Tree Analysis and the appealingly simple Five Whys. Each method has its particular advantages and drawbacks. Many were developed for some particular sector or application and while they work very well on their home territory, they are not all as universally applicable as their advocates sometimes hope.
All these methods are essentially about mapping causes: identifying the immediate causes of a failure as well as the causes of those causes and so on. To borrow an example from the philosopher David Lewis; we have the bald tyre, the drunk driver, the blind corner, the approaching car. Together these cause the crash. Each of these causes has its causes, which are also causes of the crash. So in turn are their causes and so on.
The different methods emphasize different aspects of causal mapping: the fishbone diagram provides a useful categorization; fault tree analysis attempts to formalize the interaction between deterministic causes; Bayesian networks do the same statistically; the five whys method focuses attention down causal chains: the cause of the cause of the cause etc. (times five).
These methods are useful to establish causal relations. They help to identify "root" causes, i.e. causes that lie at the root of several chains leading to the final failure, for example the driver's reckless disposition, which causes his drinking and the neglect that lead to the bald tyre. They help us to manage complicated interdependencies, especially when those interdependencies are statistical rather than deterministic.
None of these methods is very clear about what a cause is, though our intuitive notion of a cause tends to serve us very well. What is more problematic is that causal chains are "dense". The blind corner and the oncoming car weren't immediate causes of the crash. They caused a swerve. That, the bald tyre and the icy road caused a skid. That and the driver's drunkenness caused him to brake. And so on. We can always drill down into any given cause and break it into smaller causes. When to stop is left as a question of judgement.
Nor is it always clear how far back in a causal chain it is useful to go (though the five whys has a pretty big clue in its title). The driver's abusive childhood is undoubtedly a cause and one that might be of interest to a forensic psychologist investigating the case, though of little interest to the road safety expert doing the same.
The general problem is relevance. Whether a cause is relevant or not depends on the context. The road safety expert has different interests from the forensic psychologist, who has different interests from the insurance lawyer who is only really interested in the drinking. When investigating machinery failure, the relevant causes are the ones that give you a solution. The question remains, though, how to choose how far and how deep you need to go to find them.
A much bigger problem with these methods is that very often the causes of a failure are not known and before we can settle down to understand the relationship between various causes, we are faced with discovering what those causes are from an unwieldy mess of observation, data and contradictory testimony. Before we can analyse the causes of a failure, we need to find them.
The great British philosopher John Stuart Mill, in his 1843 book A System of Logic, gave five methods for discovering causes. Of these, the method of difference has proved the most fruitful for practical applications. Faced with a failure, say a damaged water pump, rather than asking "Why did this pump fail?" you ask "Why did this particular pump fail and not this nearly identical pump next to it?" or "Why did this pump fail today and not yesterday?" or "Why did this pump fail and not the identical pump operating elsewhere?" The idea is to look at the difference between the failure case and a case as similar to it as possible but in which the failure did not occur. The cause of the failure must be found in the difference between the two cases. By restricting attention to the differences between the two cases, you essentially ignore everything they have in common and you dramatically reduce the amount of material and the number of possible causes you need to consider.
By switching through different similar cases, we can generate a large number of hypothetical causes and causal scenarios. Not all these will be true, but there are well defined criteria for evaluating causal theories and choosing between them. And it's far better to have to choose between too many than to miss the right one or not to have any at all.
Not only does Mill's difference method root out causes it ensures the causes it discovers are relevant. Returning to our investigation of the car crash, our forensic psychologist asks "Why this man and not any other man?" and instantly focuses in on the drinking and the driver's reckless disposition. The road safety expert asks "Why this corner and not the one before?" By using different contrasting cases, we highlight different patterns in the causal history and emphasize different interests. By making our contrasts as close to our failure case and as realistic as possible, we ensure that the causes the difference method reveals are relevant in the sense that they can motivate a solution to the problem at hand.
There is much more to fruitful causal analysis than Mill's difference method, which is just one of several paradigms Lloyd's Register ODS has developed over the last fifteen years for approaching mechanical failure analysis. We specialize in finding out what went wrong when machinery, components or structures fail, and then using that information to fix it as quickly as possible and to ensure that it never happens again.