What is the MTBF (Mean Time Between Failures)?

10/10/2020

In our company Leedeo Engineering, when we carry out RAMS studies for our clients, it is very common that we observe the difficulty in differentiating several parameters or basic objectives of reliability, maintenance and availability (RAM objectives) that characterize products, systems or facilities. On many occasions, they are determined in the biddings or specifications of the final customer. Between all of them, the most used is certainly MTBF.

In this way, using a graphical approach and in a quite simple way, we present here the most relevant values in RAM studies, as well as optimisation and improvement strategies. Next, we expose the parameters that we will define, explain, and understand in this article:

Most important RAM indicators

MTBF: Mean Time Between Failure

MTTF: Mean Time to Failure
MDT: Mean Down Time
MUT: Mean Up Time
λ:Failure rate
MTTR: Mean Time to Repair

As seen in the previous graphic, we have a system that is working correctly over time -marked in blue-, until an error occurs -ray marked in red-. When the error occurs, the system will be shut down for a while (marked in yellow). Sometime later, and assuming it has been repaired, it will come into service again (marked in blue).

In operation state (marked in blue) will remain until an error will appear again (lightning marked in red), after a certain period of time.

What does it mean MTTF parameter?

The MTTF parameter stands for Mean Time to Failure and it is the time the system is active, fulfilling the functionalities for which it has been designed. The MTTF and the MUT (Mean Up Time) are exactly the same in repairable systems, where the system can go out of service and restart again once it has been repaired.

It is quite common to mistake MTTF and MTBF. As shown in the previous graphic, the difference between both parameters is MDT (Mean Down Time) -marked in yellow-. That is the time the system is out of service. In this way, we can say that MTBF = MTTF + MDT. It is certainly true that, for items that cannot be repaired, MTBF and MTTF are the same.

We have noted that the MTTF is really the parameter that, once the system has been put into service -for the first time or after corrective maintenance-, will tell us how long we have planned to kept it in service.

What does it mean the parameter MTBF?

In the field of RAMS Engineering, MTBF (Mean Time Between Failures) parameter or objective is the most used, requested and extended. The MTBF (Mean Time Between Failures) indicates what is the forecast of frequency failure of the system. Sometimes, especially in rolling stock , it is useful to use MKBF (Mean Kilometres Between Failure).

The MTBF will answer the following question: how many hours could take a system failure to appear under review? The MTBF is totally linked to the reliability of the equipment, the product, or the installation.

In exponential distributions and models, the most widespread in the sector due to combining simplicity plus excellent results with constant failure rates, reliability is defined as follows:

R(t) = exp(-λt)

Where R(t) is reliability of a component or system in a given time. The function returns results between 1 and 0 [t=0, t=infinity]. The results should be interpreted as the probability that this component is working at time t. In these models, MTBF is considered as the contrary of the failure rate:

MTBF = 1 / λ

MTBF can be obtained with justification through IEC standard, Handbook, or reliability databases.

What does it mean MDT parameter?

MDT (Mean Down Time) parameter is the time when a system will not work properly if an error occurs. Going into the detail of the matter, MDT is divided between the moment the equipment failure is detected -therefore, from the moment when we can order for repair-, plus the time it takes to get fixed, the so-called MTTR (Mean Time To Repair).

"Failure Detection Time" deserves a special mention. During that time, risk is considered to be extremely critical since there is a system that does not work properly. However, it is believed to be in service and functioning properly. Therefore, this time must be minimised to the maximum, and it is intolerable in many applications that this time will be different for a few seconds or milliseconds.

What does the parameter MTTR mean?

The Mean Time to Repair (MTTR) is the average time taken to repair the equipment, the product or the installation being under analysis. Thus, MTTR will indicate the capacity or goodness of repair. The aim is making this as limited as possible. As shown in the graph, the MTTR is divided into diagnosis of the particular error, possible replacement of equipment, adjustment of the equipment or the system itself when introducing a new equipment and, finally, putting it into service again. The Mean Time to Repair (MTTR) is totally linked to the maintainability of the equipment, the product, or the facility. That is, the quantitative parameter of maintainability ( maintainability, "M") is the MTTR.

Why are we always talking about "Mean Time"?

As you can see, when talking about RAM objectives or values, we use to say "Average Time of" in all cases. The explanation of this particularity is that these parameters are calculated on statistical analysis basis. Also, there are projections or probabilities of what may happen in the future. In addition, these projections -in their simplest analysis process- are used to carry out in an exponential model approach that will result in a constant failure rate, throughout the service life of the product.