What is the RAMS bathtub curve reliability model?

16/10/2020

In this article, we present a model of reliability which is widely used in RAMS engineering: .The Bathtub Curve. You will see why it is called Bathtub Curve, which are its three stages and what the characteristic in each of them is linked to reliability or failure rate. You will also learn interesting concepts such as infant mortality, burn-in strategyor vacuum tests, also called trial phase. Leedeo Engineering, RAM engineering and Safety specialist.

Failures are part of the life of an asset and they can occur at any time: may appear during its operation, while it is at rest, when it is requested to do a certain function or even while it is being dismantled.

Therefore, if we want to improve the reliability of an asset, we must know how much, how often and how an asset fails. For this, it is essential to know the failure rate (known as lambda λ), which is still the number of failures detected during a determined period of time:

But it can also be expressed as the percentage between the number of failures detected from a series of assets:

In general, from the conception of an asset to its removal there are three stages, well-differentiated in time, where we can locate the source of a failure:

STAGE 1: youth
STAGE 2: maturity
STAGE 3: aging

These three stages of an asset are normally characterised by a failure rate shaped like bathtub curve, as shown in the image below. This is where the concept of bathtub curve comes from:

STAGE 1: YOUTH

The first stage of youth or Infant Mortality is considered a critical stage, since it is not known exactly how that asset will behave during its operation. So, a remarkably high failure rate is expected . The name of Infant Mortality comes conceptually from the idea that the asset fails when it has only a few hours of operating life. Sometimes, it never works.

For this reason, it is essential to carry out a thorough investigation. Otherwise, we would obtain numerous errors when using the asset -in time of use or operating-. Assuming that a robust design and validation of the asset has been carried out, these errors usually stem from errors in manufacturing, assembly, or installation processes of the assets. In practice in the area of Infant Mortality, design errors that generate failures in the asset are also collected.

Therefore, in some way, the youth phase of the curve of the bathtub of a product (a few hours after it is first switched on) explains to us that it is in the earliest stages of a product's life, when it fails due to errors in its manufacture (assembly or installation) and also design . Therefore, taking numbers as an example, we have to imagine a production with volume -for instance, 1.000 units produced-. There can be a high percentage (like 1%) that fails a few hours after being turned on for the first time -during the first 72 hours-.

Obviously, it is convenient that this infant mortality does not reach the customer since, as we will see later, he/she has bought a product with a failure rate level equal to that of maturity, not higher. As can be seen in the graph of the bathtub curve, the maturity failure rate is low and predictable(on many occasions it is modelled with a constant failure rate). In addition, Infant Mortality is normally within the guarantee period of the asset, so the replacement cost of the equipment will have to be covered.

Currently there are numerous methodologies of quality control in order to minimize any defect derived from manufacturing processes. Among them are methodology Six Sigma(6σ) or Statistical Quality Control (SQC6). Another widely used strategy is burn in or initial factory "burning" process.

The concept of burn in consists of, in an accelerated or nominal manner, giving the asset hours of operation in a controlled environment and without the product having left the manufacturer's premises. All this until it will reach the hours of operation that place our asset in the maturity phase.

Let us take an example. If we manufactured DC motors, what we would do in our factory, then we shall have a bench where we would leave the motor running for 24 hours, for example, before sending it to the customer. Or, in case we manufactured pushbuttons, we would have some tools to make 1 000 pushbuttons before sending the product to the customer. In this way, motors or pushbuttons with manufacturing defects would fail, and we would discard them. The way to discard them would surely be entering them in a repair and verification of production chain, so that such equipment would not reach the customer.

It is also quite common, when we talk about installations, to find terms like vacuum tests or trial phase. Both are the start-up and operation of the installation without, for example in the case of the railway, passengers -in a very simplified form-. Therefore, following the same strategy discussed, the facility is given operating hours without endangering passengers, in the event of an unexpected failure . As many hours as necessary to overcome the stage of youth and to bring the system to the stage of maturity .

STAGE 2: MATURITY

Upon the conclusion of the stage of youth, the expected stage of maturity or useful life of the asset begins. At this stage, failures may appear when the system is operating within its nominal conditions. This period is usually the longest and the most relevant one. It is always advisable to replace the system before reaching the last stage of its life cycle. That is to say, before entering aging stage. It is important to stress that it is in the maturity stage where we will have the failure rate for which we have designed our equipment.

In exponential models used in RAMS engineering, this stage is characterised by the fact that the failure rate is kept constant. Such exponential model corresponds to the simplest model among reliability models. It is usually accurate for electronic components and its simplicity also allows for easy mathematical processing which, with a close approach to reality, fully justifies its acceptance.

In general, reliability of an asset can be measured as the mean time between two consecutive failures (MTBF, Mean Time Between Failures). Therefore, for the calculation of such reliability , when assuming a constant failure rate, it is reduced to the following equation, being failure rate and Mean Time Between Failures (MTBF) being inversely proportional:

This parameter is of great interest within methodologies aimed at predicting reliability when operating data are scarce, non-existent, or not very rigorous.

STAGE 3: AGING

Finally, the third and last stage is known as ageing stage , where mainly the failures resulting from wear and tear appear by naturally accumulating hours of operation. At this stage, there is an evident increase in the value of failure rate, so that the decrease of reliability indicator will be accelerated as time passes. Maintenance plans are the key to delaying the appearance of this last stage as much as possible. It is important to correctly balance the costs associated with the lack of reliability, off-duty, and preventive and corrective maintenance activities. Such costs are triggered at this stage, with the possibility of replacing the asset.

The figure above shows two bathtub curves. The red one is an example of a real function of failure rate and the blue curve is a theoretical example where the three phases of the life cycle mentioned above are completely defined: infant mortality stage between time values 0h and 100h, life stage (maturity) between 100h and 600h and finally the ageing stage from 600h to 900h. In contrast, in reality (red curve), transitions between one stage to another of the failure rate are not so obvious.

At Leedeo Engineering, we are specialists in the development of RAMS Railway projects, applying CENELEC standards EN 50126, EN 50129, EN 50128, EU Implementation Regulation 402/2013 with the application of the Common Safety Methods CSM-RA, supporting any level required to RAM and Safety tasks, in the development and certification of safety products and applications.