Introduction to Common Cause Failure

Andrew O'Connor
Relken Engineering
Have a question or want to speak with Andrew O'Connor ? Contact us with your details.

[Also posted on the CRENuclear Power Station - Common Cause Failures Preperation website on 29 Jul 2012]

Common Cause Failures (CCF) is one of the reasons why a classical reliability model of your system may dangerously underestimate the risk of failure. It directly attacks any redundancy in your system to create a single point of failure. In fact, studies have shown that CCF events may contribute between 20% – 80% of the unavailability of safety systems within nuclear reactors [Werner 1994]. This post will provide an introduction into what CCFs are and why they have such a significant impact on your system. [CRE BOK III.A.4]

An example of CCF causing unexpected system failure is Eastern Air Lines Flight 855, where the failure of all three engines was caused by a loss of oil from missing O-ring seals from each engine. The NTSB identified the probable cause as “failure of mechanics to follow the established and proper procedures for the installation of master chip detectors in the engine lubrication system, the repeated failure of supervisory personnel … and the failure of Eastern Air Lines management…” [NTSB 1983] The failure of each engine cannot be treated as independent as they share maintenance crews, supervisors and management system.

Worked Example

To best understand the impact for CCF, we will consider a system which consists of two backup generators in parallel. Only one generator is required to power the safety critical item, with one redundant generator in case the first fails. The generator may have a failure to start probability of $P(A) = P(B) = 0.0049$ per demand [Vesely et al. 1994].

The probability of system failure can be calculated below to be 2.4E-5 (on demand).

Example of redundant generators without common cause failure being considered  Fault tree for redundant generators without consideration for common cause failure

$P(S) = P(A)P(B)$

Definition

In simple terms a CCF is the failure of multiple components from a shared event which has been transmitted through a coupling factor. A formal definition is provided by NUREG-CR5485:

A CCF event consists of component failures that meet four criteria:

  1. two or more individual components fail or are degraded, including failures during demand, in-service testing, or deficiencies that would have resulted in a failure if a demand signal had been received;
  2. components fail within a selected period of time such that success of the PRA mission would be uncertain;
  3. component failures result from a single shared cause and coupling mechanism; and
  4. a component failure occurs within the established component boundary.

It is generally accepted that Common Cause Failures do not include those multiple component failures which fail from a functional dependency that would be modelled in a traditional fault tree or system reliability model. Instead it recognises that, in particular on redundant systems, that a dependency exists between components that were manufactured by the same company, or maintained by the same person, or exist in the same location.

Using our example, a classical Common Cause Failure may be that the two generators are maintained by the same person who was following an incorrect maintenance procedure. This means that if the first fails (due to this mistake) it is highly likely that the second will also fail. The System Reliability figure obtained above assumed that the failure of the two generators were independent. Unfortunately our coupling factor (the maintenance person and the maintenance procedure) mean that generator A and B are dependent on each other. Other examples include a manufacturing defect of a parts supplier caused defective air cleaners to be installed on both generator, or both generators existed in the same location which had a flood occur.

Modelling CCF Events

Modelling CCF events is a complex topic which cannot be covered in this article, however I will briefly show the treatment of CCF in a fault tree in order to show the magnitude of CCF on a system. To account for the CCF dependency between generator A and B, each basic failure event can be divided into a CCF element $\{X_{AB}\}$** and an independent element $\{A_i\}, \{B_i\}$.

**Note there are numerous methods for estimating the $P(X_{AB})$ number, the NUREG-CR5485 or email me for a more comprehensive list

$P(X_{AB}) = 1.55E-4$ [Wierman et al. 2007, p.78]
$P(A_i) = P(B_i) = 4.745E-3$

Redundant generators with consideration for Common Cause Failures    Fault tree for redundant generators with consideration for common cause failures

$P(S) = P(X_{AB}) + P(A_i).P(B_i) – P(X_{AB}).P(A_i).P(B_i)$

Using these figures the revised probability of system failure can be calculated to be 1.77E-4 (on demand). The original system failure estimate was underestimating the probability of failure by a factor of 7.4.

So why is Common Cause Failure treated differently to any other dependency we would model in a fault tree. There are numerous reasons including:

  • The set of events which could be a common cause event are so vast that it would be impossible to include each as discrete events in a reliability model, so these events get grouped into a collective group and modelled as a single type of event.
  • Common cause failure events are so infrequent that its unlikely for exactly the same event to occur again, so in order to make estimates from historic events the events are grouped into a single classification to create an empirical estimate of their effect.

It is important to recognise that CCF events are the ‘catch all’ for dependencies which are not explicitly modelled in your reliability model.

Conclusion

In equipment which requires a high level of reliability (like safety critical) redundancy is often used to meet reliability targets. Common cause failure directly attack the effectiveness of using redundancy and can seriously undermine the reliability of the system.  In reliability predictions and modeling, careful consideration for CCF's must be made to ensure the probability of failure is not severely understated.

Bibliography

Mosleh, A., Rasmuson, D. & Marshall, F., 1998. Guidelines on Modeling Common Cause Failures in Probabilistic Risk Assessments, Washington DC: U.S. Nuclear Regulatory Commission. NUREG/CR-5485

NTSB, 1983, Aircraft Accident Report: Eastern Airlines, Inc. Lockheed L-1011, N334EA Miami International Airport, Miami, Florida, May 5, National Transportation Safety Board, Washington DC, n.d.

Vesely, W.E., Uryasev, S.P. & Samanta, P.K., 1994. Failure of emergency diesel generators: a population analysis using empirical Bayes methods.Reliability Engineering & System Safety, 46(3), 221-229.

Werner, W., 1994. Results of recent risk studies in France, Germany, Japan, Sweden and the United States, Paris: OECD Nuclear Energy Agency. NEA/CSNI/R(1994)10