Reliability Centered Maintenance: the RCM
In this article, we will make an introduction to Reliability Centred Maintenance (RCM). That is, maintenance focused or based on the reliability of the installation and all its components. The use of reliability as the main driver of this maintenance strategy, increasingly used for its benefits, allows the use of RAMS Engineering methodologies that become decisive for, above all, the analysis and design of the maintenance plan.
Generally, the development of an adequate maintenance strategy involves defining a set of processes and resources in the form of maintenance plans, which respond to and/or are supported by the following 4 basic concepts:
The first one is the theory and the models of maintenance and maintainability. In fact, state of the art and knowledge, especially sectorial ones regarding maintenance, will allow to start from a base of basic knowledge in sectors or industries where this is transversally applied, in all its players.
The second is the company's or organisation's own experience. There is no doubt that in addition to the knowledge of the sector, there is some knowledge, based mostly on experience, from each of the actors who carry out maintenance processes.
Thirdly, transversal knowledge is identified in all the industries that come from the manufacturers or integrators of the equipment, products, or systems. Understanding as manufacturers those companies designing and manufacturing equipment that will end up installed and in service, within the scope of a system that must be maintained.
And finally, again very much associated with the different industries, but also with the geographical or economic regions, the set of legislation, directives or standards that must be complied with in terms of maintenance and maintainability. With reference to this last point, Leedeo Engineering for example, provides the background and experience of the use of railway CENELEC standard, specifying the requirements for the maintainability of railway systems in the European Union (CENELEC EN 50126 standard). Although the railway standard CENELEC is of European origin, it is being massively used in the 5 continents.
What is Reliability Centred Maintenance or RCM?
Reliability Centred Maintenance (CRM) is a systematic approach to maintenance, analysing and considering the way in which systems can fail. It prioritises safety and economy -from the point of view of profitability of the asset under maintenance-, which makes it possible to identify and classify effective and applicable preventive maintenance tasks. Therefore, with a clear and direct objective, Reliability Centred Maintenance is about: reduce maintenance costs by focusing on the most important functions of the system and avoiding or eliminating maintenance actions being not strictly necessary.
Below are the 5 steps that we use in Leedeo Engineering to implement a Reliability-Centred Maintenance (RCM) on an asset or set of assets, whether it is a railway, a goods production line, a train or an airplane, or any mega infrastructure.
The first step is to design a working group that will implement the new Reliability Centred Maintenance (RCM) system. Obviously, in these cases, the involvement of experienced and knowledgeable staff will make the difference between success and failure.
The second step in the Preparation process is to define and clarify the objectives and the scope of work. In this sense, it is important to highlight the importance of defining the scope of the maintenance procedure. It is quite common that in the interfaces between one asset and another or between systems, parts will be left out of reach of all working groups or initiatives. It shall be assured that it is the task of another team or maintenance plan to carry out the management of that asset. It will therefore be especially important that the boundaries are well defined and communicated to all stakeholders. This concept leads us to the very definition of limitations of the analysis and, therefore, to answer the following question: how far will we go and how far will we fail to reach?
On this same subject, we will also define the objectives. In this sense, usually the objective of availability is defined. For instance, annual of the installation or in each critical point of the installation with an impact on business development.
The third and last step is to collect and usually due to its deficiency- generate documentation, schematics, and process diagrams in order to have a good starting point for work. In the following phases you will have to analyse the detail of the installation and what happens in it, and therefore it is essential to have it documented. This step can be considered as a documentary audit since, in most cases, we will have to search, find, and remedy both the lack of documentation and discrepancies between the documentation generated and the actual installation.
Development of a functional failure analysis.
Once the preparation stage has been completed, the first stage of analysis associated with RAMS techniques for the implementation of RCM is the development of a Functional Failure Analysis or FFA. The objective of an FFA is threefold. The first one is identifying and describing the functions required for each equipment or element of the installation or system. Input and output interfaces will be described to ensure that it will function properly. Having identified the equipment and its interfaces, the second step is to identify how the system can fail in its various modes of operation. For each functional failure that the equipment may have, its criticality must be analysed with the following two factors: "impact on" and "severity", in the following terms:
Finally, it will be important to identify and define the occurrence of such failure: frequency of occurrence, either by a qualitative approximation or, if the information is available, in a quantitative form. This table of probabilities of occurrence can be used for a qualitative approach:
Therefore, for each of the subsystems of our installation architecture documented in the preparation step, we will have a table with this information from the FFA, being able to have a clear vision of what they do, how they interface with the rest of the system, how they fail and what importance or impact the failure of each of them has.
Classification of sub-systems
The second stage of RAMS analysis involves classifying each of the sub-systems identified and analysed in the FFA into the following 3 categories: At a practical level, this implies adding 3 columns to the initial table developed, for example marking with an "X" in which category each failure is found:
- Category 1: We mark subsystems with FFAfailures that have HIGH or MEDIUM consequences for any of the impact destinations.
- Category 2: We mark subsystems with failures of the FFA that are likely to occur frequently or probably. They are characterised by high repair costs, low maintainability, high spare parts delivery times, component obsolescence, or that their corrective maintenance requires external personnel beyond our control.
- Category 3: We mark all other subsystems as FFA failures, that are outside Category 1 or 2.
As you can imagine, the objective of classification is prioritisation. Indeed, we will focus our action and improvement plans on Category 1 and 2 of the sub-systems. And we will leave for a second loop the improvement of our organisational maintenance system, sub-systems of Category 3.
Development of a FMECA (Failure Mode and Effects Critical Analysis).
Once the sub-systems would have been classified for their criticality in the previous step, the third step of the RAMS analysis of an implementation RCM is again to evolve table FFA towards a complete FMECA adding new details towards maintenance. That is to say, enriching the table developed in the FFA with the following parameters -on the columns- for each failure found and for each sub-system:
- Effect of malfunctioning.
- Failure mechanism. That is, how the failure occurs. What cause or what is its origin.
- Maintenance action that shall be taken.
- Ability to detect the failure produced.
- Interval to measure if the failure has appeared.
The purpose of this phase has two objectives. The first one is to conclude that the criticality of a failure, with the effect produced, and the capacity that we have to detect it forces us to implement changes in the assets that we have:
- either reduce the criticality or minimise the effect they produce or, finally,
- failure detection is adequate to the two previous parameters,
- we would have to improve their detection and alarm produced in front of the occurrence of such failure.
The second objective is to establish the preventive maintenance actions that we have to carry out, so that these failures do not generate the undesired effect that we have identified.
Design of maintenance actions
The last step is the definition of our maintenance actions, based on the following five different types of actions:
- scheduled check-ups,
- replacement of programmed components (Line Replaceable Units, LRUs),
- test of programmed functionality,
- programmed task against the appearance of a condition. Many times this task can be generated by a predictive maintenance process,
- and corrective action.
As we can observe, most of the activities we will carry out -four of the five actions- are of a preventive nature. The final objective of the RCM is that the corrective actions tend to 0 and that all the maintenance actions can be preventive. Preventive activities are predictable, controlled, without the "chaos" effect that produces the need to put equipment back into service in the face of corrective action. Above all, they tend to be much more economical than corrective action.
Preventive maintenance, after all, should be seen as those activities that will allow us to avoid a failure, detect the start of a failure, or discover a hidden fault.
The process to determine each preventive maintenance activity follows a quite simple logic. However, it is important to apply strictly corrective maintenance to ensure this will helps to defuse unwelcomed situations:
- Do we have the possibility of having an indicator or alarm of the appearance of this fault?
- If the answer is yes, a task must be programmed when this condition appears.
- If the answer is no, we must ask ourselves if this failure impacts on the overall failure rate of our system or installation. If it does make an impact, we will schedule two possible activities: review (if possible) or we will schedule the replacement.
- If the failure does not initially impact the overall failure rate of our system. For instance, we can imagine why we have redundant solutions. Then, we will schedule functionality test activities.
Finally, as far as corrective maintenance actions are concerned, these must be seen as deliberate decisions since they lead to the conclusion that preventive maintenance tasks are not possible or are less favourable. However, it is important to emphasize that corrective maintenance, in general, is often the most expensive option. Consequently, it shall only be used on low-cost components that are easy and quick to replace and not being critical to the smooth running of the installation.
Implementation of Reliability Centred Maintenance or RCM uses the base of the RAMS Engineering for the development of maintenance plans of the installation or system, being under analysis. The experience in different industries known for the critical nature of their maintenance, such as aerospace, railways, nuclear or oil gas, has reported, thanks to implementation of RCM methodology, cost reductions in maintenance activities, improving, in most cases, the reliability of their operations and therefore the global availability of the system .
If we improve reliability of the systems and maintenance that affects availability is very reduced, then availability will be clearly improved. On a larger or smaller scale, RCM methodology can be applied to any facility or productive asset with the same objective reported by these companies: reducing costs and improving the productive indicators of the asset.
Our company has experience implementing this methodology thanks to the strong RAMS engineering background that supports our knowledge. Please do not hesitate to contact us if you require any further information.
At Leedeo Engineering, we are specialists in the development of RAMS Railway projects, applying CENELEC standards EN 50126, EN 50129, EN 50128, EU Implementation Regulation 402/2013 with the application of the Common Safety Methods CSM-RA, supporting any level required to RAM and Safety tasks, in the development and certification of safety products and applications.