What is a FTA (Fault Tree Analysis)?

16/10/2020

The fault tree analysis (FTA), is a study with a top-down approach (from top to bottom) for the failure analysis , starting with a possible undesirable event (i.e. an accident) called TOP EVENT or TOP HAZARD to determine in what way, and under what conditions, such TOP EVENT can happen either by individual events or by a combination of different events.

The FTA is the most widely used technique for the causal risk analysis, as well as for the study of the reliability of products, systems, and installations. Due to its structure in branches and ramifications (as we will see in this article), the FTA ends up having an inverted tree shape and it is named after that.

In short, regarding the FTA:

  • It identifies all the possible causes of an unwanted event.
  • It is a structured deductive analysis always from top to bottom, therefore from less detail -particularly at the system level- to more detail -at the component level-.
  • It allows to have very interesting knowledge of system under analysis. Defects can occur or be introduced from design and operational procedures, as well as from maintenance that may be insufficient, for instance. In other words, you can also use operational systematic failures.

What are the main steps in developing a Fault Tree Analysis (FTA)?

  • Step 1. Definition of the system, the TOP event (the possible accident) and the conditions of environment.
  • Step 2. Creating the fault tree
  • Step 3. Identification of the multiple failures that trigger failures of the TOP event. To this concept called cut-off set, it is that event or fault that if eliminated, the remaining faults or events collectively no longer generate a fault.
  • Step 4. Qualitative analysis of the fault tree.
  • Step 5. Quantitative analysis of the fault tree.
  • Step 6. Report of results, conclusions and opening of an action plan, if appropriate.

How do we do the FTA?

A good starting point for launching a FTA is often a FMECA and a block diagram system of the product, system, or installation. Based on our expertise, these are the two starting points for starting the development of an FTA. It will also be important to consider the different operating modes of the system, as well as the environment where it will co-exist. The objective of this preparation is to have the necessary tools in order to identify and understand the cause-effect relationships that lead to the TOP event.

Although it seems obvious, it is also very important to define and communicate to all the parties involved, the physical limits and external conditions of stress (shall we include the analysis of vandalism, of an electric discharge of lightning or, are we protected against this type of elements by other equipment external to ours? ) of the analysis, since it can be common for different stakeholders to assume different limits and for parts of a system to be left along the way to be analysed, which can cause, due to confusion, a systematic failure affecting security.

Construction of the FTA

As mentioned above, the first step is to define the TOP event in a clear and unambiguous way. Obtaining our TOP event will, in most cases, result in security function(s) denied in our system. For instance:

  • Safety function of a track circuit (simplified): detect in all cases, the presence of a train on the track section defined by the track circuit.
  • TOP event or TOP Hazard: Failure to correctly detect the presence of a train on the track section defined by the track circuit.

Once we have defined our TOP event from our security functions, we must explore in our FMECA what the necessary immediate events and conditions are, independently or together, that cause the TOP event. The events that cause the top event , will be linked via a door OR. The events that jointly generate the TOP event -they must happen at the same time-, shall be previously joined with Doors AND.

Once this first "floor" or layer has been generated, the process is repeated with these new second level events, creating a new "floor" that, by means of OR and AND doors, represents how these events are generated. Repeating the process: those events that jointly generate the higher event, are joined by means of doors AND, plus all those resulting events that generate it by means of doors OR.

This process will be repeated as many times as necessary, generating a number of "floors" determined depending on the complexity of the system, until it leads to what are called basic events. After all, they are components for which we have information on their failure rate (1/MTBF).

With the information from the failure rate of basic events, we went on to perform the quantitative DOWN - TOP analysis, to find the failure rate (and MTBF) of TOP event. One OR door will add up the failure rates, one AND door will multiply the failure rates. As failure rates are values lower than 0 and, logically, when two events have to occur at the same time, to trigger an event (expressed with a Door AND), the failure rate of both events will be multiplied, decreasing the failure rate and, therefore, its probability of occurrence . For OR doors, that is, an event is generated if "this or that" occurs, the failure rate increases as the two probabilities of occurrence are added together.

In this way, we start to climb the floors in the calculation of the failure rate upwards, starting from the basic events, calculating the failure rate for each event, going up floor by floor, until the TOP event. Finally, we will obtain a failure rate associated with the TOP HAZARD or TOP event and, therefore, we will know what is the probability of occurrence of non-compliance with our safety functions , going from a qualitative concept (the safety function ) to a quantitative concept. By associating this probability of occurrence to SLI level, we can confirm whether our system has SLI level which is required.

In most cases, for FTAs of real projects, the complexity of the FTA is so high, that it is extremely complicated to make a tree of failures without the use of a computer tool with support. In this sense, Leedeo Engineering has specific software applications (for instance, the fault tree above) to perform these tasks in RAM and Safety development projects.

Therefore, the FTA allows us to convert a qualitative concept such as a safety function and its denied function TOP HAZARD , into a failure rate or probability of occurrence , therefore, of a quantitative concept, this tool being useful to know at SIL level or, the failure rate with regard to reliability of a system.


Our company Leedeo Engineering is a specialist in the development of FTAs in our RAMS projects, giving support to our clients, at any level required for RAM and Safety tasks, and both at the level of infrastructure or on-board equipment. 


Are you interested in our articles about RAMS engineering and Technology?

Sign up for our newsletter and we will keep you informed of the publication of new articles.