ARP and ART Acronyms

CUA: Commonly used acronyms associated with Asset Reliability Practitioner® (ARP) and Asset Reliability Transformation® (ART).

scroll icon Scroll Down

The following is a list of commonly used acronyms and terms with their basic meanings. In this version, the definitions provided may differ from ISO standards and other references – they simply describe how they are used in ARP and ART courses, books, and other materials. Download pdf here.









Asset Reliability Transformation

A detailed process that guides an organization through the establishment, definition, and implementation of a reliability and performance improvement program.



Understand what is most important to the business so that every activity is aligned with the goals of the business. Assess the current state, evaluate the current strengths and weaknesses, establish (conservative) goals, establish a business case, and gain enthusiastic senior management support.



Establish the team to lead the implementation. Develop the plan to implement the strategy. Develop the first pass of the maintenance strategy.



Identify and develop leaders. Create awareness and enthusiasm in every person within the organization. Clarify roles and responsibilities. Educate, upskill, and certify. Encourage everyone to contribute and constantly communicate the achievements (and failures).



Stabilize and regain control of the maintenance department in order to establish the fundamentals and minimize reactive maintenance.



Ensure that every step between the original project concept and the acquisition and transportation of equipment into the plant guarantees high reliability and performance and thus the lowest total cost of ownership.



Ensure there is one way to do everything, and everyone does it that one way: workflows, procedures, standards, and management of change.



Proactively care for all assets so they deliver optimum reliability and performance: clean, smooth, tight, and well lubricated.



Acquire and analyze data so that opportunities for improvement are identified and all decisions are aligned with the goals of the business: CBM, KPIs, OEE, economics, etc.



Learn from failures, poor performance, and near misses so that root causes are eliminated and failures are not repeated. (EOL=End of Life)



Continually seek to improve every aspect of the reliability and performance improvement strategy.


Implementation Phase

The ART process is broken into 10 key phases. The VALUE, STRATEGY, and PEOPLE phases establish the fundamentals: set targets, develop a plan, and establish a culture of reliability. The CONTROL phase overcomes the reactive maintenance barrier. DISCIPLINE, CARE, ANALYTICS, and OPTIMIZE form the cycle of reliability: we do everything one way, care for our assets, make data-driven decisions, and constantly improve. With the addition of the ANALYTICS and EOL phases, we have the entire life cycle: from the acquisition of the asset to the disposal of the asset.


Step within a Phase

The phases are broken up into 65 major steps. While some steps are limited in scope, many define significant enterprises: establishing planning and scheduling, performing condition monitoring, and so on.


Recommended Practice within a Step

In order to provide definable tasks and greater resolution into what is required, we have defined multiple “recommended practices” within each step. The recommended practices provide the guidance necessary for you to be successful.


Business Review

The process of closely examining the goals of the organization so that it is clearly understood how a reliability improvement initiative can add value to the organization. Every task performed should help the organization achieve its goals – there should be a clear line of sight between a person’s role and the business goals.


Business Process Review

See BR.



Asset Reliability Practitioner

An accredited training and certification program developed by Mobius Institute for practitioners and leaders involved with reliability improvement.


ARP Reliability Advocate

A training and certification program intended for people who need detailed awareness of how to improve reliability of physical assets in an organization.


ARP Reliability Engineer

A training and certification program intended for reliability engineers who must deal with the technical aspects of reliability improvement, including reliability data analysis, maintenance strategy development, condition monitoring, and precision maintenance.


ARP Reliability Program Leader

A training and certification program intended for senior reliability engineers or managers who must develop the business case and then define the overall strategy and implement the program. They must deal with the economics, strategy, and organizational culture.



Asset Criticality Ranking

The process of developing a criticality score based on an asset’s consequence of failure, an asset’s likelihood of developing a fault condition, and the likelihood of detecting the onset of failure. The score is used to rank the assets from most critical to least critical so that activities can be prioritized.

Criticality analysis can help us set priorities, so it is first used in the VALUE phase.


Asset Opportunity Ranking

Whereas the asset criticality ranking generates a list of assets ranked from those that pose the greatest risk to those that pose the least risk, an asset opportunity ranking generates a list of assets ranked from those that present the greatest opportunity to produce a higher volume of quality product (or the provision of improved service) to those that present the least opportunity.

You will never read about the asset opportunity ranking in any other book. But the author believes that everyone involved with reliability and performance improvement must not only think about minimizing risk of failure; they must also think about maximizing the performance of the organization.

For example, you may have a piece of equipment that is quite reliable (i.e., it rarely fails), but unless it is adjusted correctly the quality of the product can be affected. By applying the appropriate brainpower, we may find ways to ensure that it always produces first-pass high quality and thus increases production output (reduces waste) and improves customer satisfaction.

Opportunity analysis can also help us set priorities, so it is first used in the VALUE phase.



Reliability Centered Maintenance

A detailed process that follows seven main steps in order to develop a detailed maintenance strategy that may comprise hidden failure finding tasks, condition-based maintenance tasks, interval-based maintenance tasks, or the decision to not take any proactive/preventive steps in order to prevent an asset from failing or a given failure mode from occurring. A proactive RCM process will also identify actions that can be taken to avoid the likelihood of failure.

RCM can be used to develop the maintenance strategy which begins in the STRATEGY phase.


Failure Modes and Effects Analysis

FMEA is a process where each and every failure mode of a piece of equipment is analyzed in order to assess the risk of failure and determine what action must be taken. During the design process, FMEA is used to identify opportunities to improve the design. As part of a reliability improvement strategy, FMEA is used to design the maintenance strategy, and it should be used to identify proactive steps that can be taken to reduce the risks of failure.

FMEA can be used to develop the maintenance strategy, which begins in the STRATEGY phase.


Failure Modes, Effects and Criticality Analysis

For all intents and purposes, FMECA is the same as FMEA.


Risk Priority Number

The risk priority number is used in association with the FMECA process. It is like having an asset criticality ranking for each and every failure mode. For each failure mode we create a score of the severity of the failure, the likelihood of the failure mode occurring, and the likelihood of detecting the onset of failure. Each element is given a score from 0 to 1 and they are multiplied together. An RPN of 1 means that there will be terrible consequences if failure occurs, the onset of failure is almost certain, and it is highly unlikely that we will detect the onset of failure.


Planned Maintenance Optimization

A process of assessing every existing PM in order to assess whether it should be performed in the future. A PM may be eliminated because it is found to be redundant, unnecessary, and/or harmful to the equipment. In a proactive environment, PMO can be used to identify the PMs that should be performed; however, at that point it overlaps with RCM.

PMO can be used to establish the maintenance strategy; however, it is an effective tool when seeking to reduce the volume of reactive maintenance, and thus it is in the CONTROL phase.


Planned Maintenance task

A maintenance task that has (hopefully) been well defined, which is scheduled in order to repair or replace a component or piece of equipment, or to perform an inspection or condition monitoring test.


Interval-Based Maintenance

You may not often come across IBM; however, it is used in ARP and ART training and documentation to refer to interval-based maintenance: the repair or replacement tasks performed at a predefined interval. The interval may be defined by equipment running hours, number of days, distance traveled (for vehicles, trains, mining equipment, etc.), production cycles, or some other interval. The assumption is that it has been proven that the maintenance task will always be required after that interval, and it is typically also proven that the need for the maintenance task cannot be determined more cost-effectively with a condition monitoring test or inspection.

The term “preventive maintenance” is commonly used to mean the same thing; however, as discussed below, preventive maintenance can often include interval-based maintenance and condition-based maintenance.

We identify the need to perform interval-based maintenance tasks when the maintenance strategy is developed. That process begins in the STRATEGY phase.



The strategy which recognizes that the costs associated with condition monitoring or interval-based maintenance cannot be justified because of the criticality of the equipment or the likelihood of a specific failure mode. Sometimes the most cost-effective strategy is to eliminate costly preventive/proactive tasks and accept that it may fail without warning.

We identify the need to adopt Run-to-Fail as a strategy when the maintenance strategy is developed. That process begins in the STRATEGY phase.


Hidden Failure Finding Task

If it is possible that a component or piece of equipment may fail without detection (i.e., we do not know that failure has occurred), but that component or piece of equipment may be called upon to operate in the future (particularly during an emergency situation), then a test or inspection may be performed in order to detect the “hidden failure” before the consequences of that failure will be experienced.

We identify the hidden failure finding tasks when the maintenance strategy is developed. That process begins in the STRATEGY phase.


Standard Operating Procedure

Clearly documented procedures on exactly how to operate a piece of equipment: starting, running, and stopping. This ensures that the equipment is not overly stressed, that it achieves optimal performance, and that everyone operates it in the same way.

Standard Operating Procedures are established in the DISCIPLINE phase and utilized in the CARE phase.


Integrity Operating Window

Equipment should have upper and lower limits specified which define the range within which it should operate to ensure the safe and reliable operation of the equipment. These limits define the integrity operating window.

Integrity Operating Windows are a part of the ANALYTICS phase.


Bill of Materials

The bill of materials is a list of all of the components and parts required to build, maintain, or repair a piece of equipment.

A Bill of Materials is a core ingredient in the DISCIPLINE phase.


Management of Change

The phrase “management of change” (and “change management”) is often confused with “culture change.” In our context, it is the process of ensuring that if any procedures are changed, if the design of the equipment or system is changed, or if there are any other changes that other people involved with the purchase, engineering, maintenance, or operation of the equipment should know about, then the changes are documented. There should be a management system in place so that changes are approved, documented, and communicated.

Management of Change is a key part of the DISCIPLINE phase.


Operator Driven Reliability

Also known as “operator care,” the aim is to engage operators in basic maintenance tasks (cleaning, inspections, etc.) and basic condition monitoring tasks (vibration overall readings, ultrasound tests, temperature readings, etc.). The aims are to reduce the burden on uniquely skilled maintenance technicians, to provide a heads-up to the condition monitoring team, and to give the operators an opportunity to contribute to the reliability of the equipment they operate.

ODR is a part of the CARE phase.



Root Cause Analysis

The most basic definition of root cause analysis is the determination of the root cause of an undesirable situation (more on this point in a moment). There are a number of techniques that can be used, such as “fishbone” or Ishikawa, “5 whys,” and fault tree analysis. That is, it is simply the determination of the root cause(s) of the undesirable situation.

The author prefers to expand the definition, although it may then overlap with FRACAS, discussed below.

The author prefers to think of RCA as the process of determining why an item (piece of equipment, component, etc.) either catastrophically failed, functionally failed, failed to perform at the desired level (including all forms of waste), or resulted in a safety or environmental incident or “near miss.” The RCA process should answer the following questions: what is the nature of the failure, how important is it that the failure is prevented from reoccurring, what is the root cause of the failure, and what proactive task(s) can be performed so that the failure is not repeated. The next steps should be economic analysis to determine the viability of implementing the task(s) and, if viable, the implementation of the task(s) and verification that the task was effective at eliminating future failures.

RCA is a key reliability improvement tool and is the main focus of the EOL phase.


Root Cause Failure Analysis

For the most part, RCFA is the same as RCA. In some instances, RCFA relates to the use of RCA after equipment failure occurs whereas RCA is applied whenever any undesirable situation (failure, safety or environmental incident, near miss, poor performance, waste in all its forms, etc.) must be understood and eliminated.


Failure Reporting, Analysis, and Corrective Action System

The way in which I have described RCA above could also be referred to as a FRACAS process. As you can see by the definition of the acronym, FRACAS begins with a failure reporting process, involves the analysis of the failure and determination of the corrective action, and then ensures that the corrective action has been taken.

FRACAS is a part of the EOL phase.


Reliability, Availability, Maintainability, and Safety analysis

During the design and equipment selection process, an analysis should be performed to ensure the equipment will provide the greatest reliability, will deliver optimal availability, is easy and the least expensive to maintain, and will not result in safety incidents. One may add to this the goals of maximizing energy efficiency and having the least impact on the environment.

RAMS should be performed as part of the ACQUIRE phase.


Quality Assurance and Quality Control

In a maintenance and reliability context, QA/QC ensures that checks are performed in order to achieve the highest level of quality. That may be performed during the acceptance testing process and after repairs or overalls have been performed and when the equipment is installed. We must not assume that the equipment is in optimal condition; we must prove that it is.

QA/QC is utilized in the ACQUIRE phase, and it is an important part of the DISCIPLINE phase as we ensure that all work is performed to the highest standard.


Acceptance Testing

When new equipment is purchased, or when equipment is repaired or overhauled (by external contractors or by internal maintenance staff), tests should be performed to ensure that the equipment is in tip-top condition. When dealing with external OEMs, vendors, and contractors, agreements should be established that define parameters such as vibration limits, performance characteristics, noise emissions, and more that must be met in order for the purchase to be completed.

Acceptance testing is utilized in the ACQUIRE phase.



Potential Failure

The P–F interval is commonly used to describe the time between when a fault is detectable (i.e., the failure may potentially occur) and when the asset will functionally fail. If the P–F interval can be estimated for the major failure modes, it will be possible to establish the appropriate condition monitoring test interval.


Time to Fail

This metric can be used in two broadly different ways. First, it is synonymous with the P–F interval described above. Second, it can be used to describe the time between when the equipment became operational to the time when it failed, as is the case with the metric MTBF.


Mean Time Between Failures

The MTBF is the average time that an asset operates before it fails. Over a period of time, if the lifespans (the time between the equipment becoming operational and its functional or catastrophic failure) are recorded and those lifespans are averaged, then we will have the mean (average) time between failures. We usually refer to the service the asset performs (for example, EXHAUST FAN #1) rather than a specific component (e.g., electric motor) with a specific serial number.

Technically it should only be applied to “non-repairable” equipment, that is, equipment that is replaced and not restored back to “full” health.

Beware of the use of MTBF, as over a long period of time the MTBF may not adequately indicate the benefits of recent improvements to reliability. The MTBF can also hide the fact that equipment will have multiple failure modes: some that result in short lifespans, others that result in longer lifespans.


Mean Time To Repair

After a failure, it naturally takes time to repair/restore or replace a piece of equipment and put it back into service. If we measure and average those times, we will have the mean time to repair. As long as we are performing that work with the highest level of precision, our goal will be to reduce the mean time to repair.



Key Performance Indicator

Key performance indicators are metrics that enable us to measure the improvements (hopefully) we achieve as we change our processes, provide extra training, improve reliability, and so on. KPIs can measure the efficiency of the planning process, the production output of the plant, and many other parameters. The best results are achieved when KPIs are used to tell us if the changes we are making are delivering value to the organization and to identify where there are opportunities for improvement, rather than being used to punish people if they do not achieve the desired results.

KPIs are introduced in the VALUE phase and are a key part of the ANALYTICS phase.


Overall Equipment Effectiveness

The OEE is typically used to measure the performance of a process. It is the combination of three factors: the availability of the equipment, the production rate, and the quality of the output. If each metric is measured from 0 to 1 (1 being the ideal state) and they are multiplied together and then multiplied by 100, we will have a percentage value from 0% to 100%.

An OEE of 100% indicates that the equipment is always available to be operated, it runs at the desired rate (e.g., 100 widgets per hour), and all of the product produced is of high quality and can thus be delivered to the customer. In many cases, an increase of the OEE by 1% can represent an improvement in revenue of many hundreds of thousands of dollars.



Computerized Maintenance Management System

This is the key software system used to, as a minimum, manage work orders. It will typically also be used to manage the entire work management workflow and even spare parts inventory. It should be possible to extract data from the CMMS in order to assess the reliability of equipment, the criticality of equipment, the effectiveness of the planning and scheduling process, and many other key pieces of information.

The CMMS is introduced in the CONTROL phase and is a key part of the DISCIPLINE phase. Data is acquired from the CMMS in the ACQUIRE phase.


Enterprise Asset Management system

An EAM system will perform the function of the CMMS and many other functions, possibly including: asset life cycle management, labor management, MRO management, contract management, analytics and reporting, and financial management. As you can see, it goes well beyond the maintenance and reliability departments.


Maintenance, Repair, and Operations

MRO represents everything associated with keeping equipment running. This naturally involves maintenance, repair, testing, spares, tools, and more. From an accounting point of view, it often represents all of the costs associated with maintaining and running the plant but not the costs associated with the products being produced or the service being provided.

(Sometimes you will see MRO represented as Maintenance, Repair, and Overhaul when it is purely focused on the maintenance department.)



Condition Monitoring

The use of measurement technologies such as vibration analysis, infrared thermography, and ultrasound analysis to assess whether a piece of equipment shows any signs of the onset of failure or if it is not being operated correctly. Tests may be performed on rotating machinery, electrical equipment, structures, and process equipment (such as steam traps, compressed air systems, etc.).

In the context of ART and ARP, we like to include visual and other inspections as part of condition monitoring because the goal is to assess the health of the equipment.

Condition monitoring is introduced in the CONTROL phase, and it lives in the ANALYTICS phase.


Condition-Based Maintenance

The strategy which results in repair or restoration maintenance tasks only being performed when the condition of the equipment indicates the need for that task. If the strategy is not followed, it is common for equipment to be replaced simply because of its age (which often results in premature failure).

The application of the CBM strategy is developed as part of the maintenance strategy. The development of the maintenance strategy begins in the STRATEGY phase.


Predictive Maintenance

While this term has been traditionally used interchangeably with condition-based maintenance (or just condition monitoring), now it is associated with the application of artificial intelligence in order to determine the nature and severity of fault conditions, and optionally to also predict when corrective maintenance is required, and even to generate the work order so the work can be performed.


Vibration Analysis

The collection of vibration readings, typically via magnetically mounted accelerometers on the bearings of rotating machinery, to detect the onset of failure and diagnose the nature and severity of the fault condition.


Ultrasound Technology

The collection of ultrasound readings (sound above our range of hearing), via contact or airborne measurements, to detect impacting, turbulence, or friction in mechanical/process systems or arcing, corona, or tracking in electrical systems.


Infrared analysis, a.k.a. thermography

The measurement of infrared (thermal) radiation, typically via thermal imaging cameras or spot radiometers, from mechanical/process and electrical systems in order to detect changes in temperature that may indicate that a fault condition exists.


Oil Analysis

The collection and analysis of oil samples in order to assess the condition of the lubricant (to ensure that it has the correct chemical and other properties such as viscosity, acidity, alkalinity, etc.) and to detect if contamination has occurred. In some cases it can indicate if the equipment has begun to fail.


Wear Particle Analysis, a.k.a. ferrography

The collection and analysis of oil samples in order to assess whether the lubricant has been contaminated (dirt, coal, fibers, etc.) or if physical damage such as mechanical wear has occurred, which will result in metallic particles appearing in the oil. The sample is treated and inspected under a microscope, as the particles are too big for traditional oil analysis laboratory instrumentation.


Motor Current Analysis (Motor Circuit Analysis)

Motor current analysis involves the analysis of the current flowing through one or three phases in order to determine if there is a mechanical or electrical fault: most commonly broken or cracked rotor bars or end rings, or rotor eccentricity.Motor circuit analysis involves the testing of the motor when it is offline by analyzing the resistance, capacitance, and inductance of the stator as the rotor is slowly turned. Faults in the rotor, stator, insulation, and connections can be detected.


Motor Current Signature Analysis

Sometimes MCA is referred to as MCSA.


Motor Signature Analysis

Motor signature analysis typically involves the analysis of the voltage and current on all three phases of an induction motor. The current analysis is described above, and the voltage analysis provides an indication of the quality of the power supply to the motor.


Non-Destructive Testing

Tests that are performed specifically to detect cracking, corrosion, and other conditions that may indicate that a component may fail. The aim of the test is to leave the component in good working order should the tests reveal that no faults exist. NDT is often applied to pressure vessels, valves, piping, crane hooks and structures, and other assets that pose a serious risk to safety and the environment.



Continuous Improvement

The process of constantly seeking new and improved ways to perform every task within the organization.

Continuous Improvement is the focus of the OPTIMIZE phase.


Total Productive Maintenance

TPM aims to improve the maintenance process via autonomous maintenance, planned maintenance, quality improvement, new equipment management, continuous improvements, and improvements related to health, safety, and the environment. TPM is often synonymous with operator driven reliability, although ODR should be considered a subset.

We would consider TPM to be a subset of the ART process as we seek to go beyond maintenance. ODR is part of the CARE phase.


Total Quality Management

Total quality management is typically an organization-wide initiative that seeks to encourage employees to find ways to improve the quality of service to customers. Typically, the primary output is the delivery of first-pass high-quality product.

Quality is critical to reliability and performance improvement. Product quality improvement is the focus of the CARE and ANALYTICS phases, and improvements are identified in the OPTIMIZE phase.


Sort, Set in order, Shine, Standardize, Sustain

5S is related to TPM, Six Sigma, and Lean. The basic process involves looking at a work area, which may include the maintenance workshop and the operating area of the plant, and ensuring that everything can be performed as efficiently as possible. Everything should be organized, clean, and standardized. Everyone should be trained and motivated to perform tasks in such a way that time is not wasted searching, moving items, and walking around the workplace accessing tools and other items. A place for everything and everything in its place. ART utilizes the 5S process.

5S is introduced in the CONTROL phase and is a key part of the CARE phase.


Six Sigma

Originally, Six Sigma was based on the basic goal that there should be less than six standard deviations of variation in a production process. To put it in English, “where 99.99966% of all opportunities to produce some feature of a part are statistically expected to be free of defects” (Wikipedia). In more recent times, Six Sigma relates more generally to a method that provides tools to improve the capability of an organization’s business processes, whether that be in the maintenance department or anywhere else.

When following the Six Sigma approach, one will follow the methodology: Define, Measure, Analyze, Improve, and Control (DMAIC). TPM, Lean, 5S, TQM, and Six Sigma are all closely related – they all seek to eliminate waste and achieve high quality so that a business can be more profitable and customers remain satisfied.



Artificial Intelligence

This is the general term given to a computer system that can learn and make decisions.


Machine Learning

Machine learning is a process whereby data is used for a system to learn about a process, a piece of equipment, failure mechanisms, and so on. Machine learning is a subset of artificial intelligence.

For example, if the system were used to observe the performance of a piece of equipment under different weather changes, machine learning would enable it to observe the weather and predict equipment performance.


Data Analytics

Data analytics is the more general term whereby data is utilized to gain insight into the process, piece of equipment, failure mechanisms, and so on. When utilizing data analytics, the human is performing calculations in order to learn.

For example, if a person analyzed data that showed how the performance of a piece of equipment changed as the weather changed and then performed statistical analysis on that data to “correlate” the changes in weather to the equipment performance, we would call that data analytics.


Predictive Maintenance

Predictive maintenance utilizes condition monitoring data and machine learning technology to predict future equipment health. Additional comments are made in the condition monitoring section.


Prescriptive Maintenance

Prescriptive maintenance takes predictive maintenance one extra step. Rather than only predicting the future health of equipment based on condition monitoring data, it will determine what corrective maintenance is required and take the necessary action (i.e., generate a work order).


ISO 18436

Condition monitoring standards

A series of standards that define how practitioners must be trained (18436-3) and certified (18436-1) in vibration analysis (18436-2), ultrasound analysis (18436-8), oil and wear particle analysis (18436-4), infrared analysis (18436-7), and other less frequently used technologies.

The condition monitoring training and certification covered in the PEOPLE phase follows these ISO standards.

ISO 17024

Certification standard

A standard that defines how certification bodies must ensure that certifications related to personnel must be fair and independent.



The process whereby a government-appointed agency assesses and periodically audits a certification body and acknowledges that the organization is acting in a fair and independent way according to the standards. ASNT performs this role in the United States. JAS-ANZ performs the role in Australia and New Zealand. UKAS performs the role in the UK.

The Mobius Institute condition monitoring and asset reliability transformation training and certification are accredited by JAS-ANZ.

ISO 55000

Asset management standards

A series of three standards that describe how an organization can be assessed in order to determine whether it is maximizing the value of its physical assets. An organization can also be certified as ISO 55000 compliant in the same way that an organization can be certified as ISO 9000 compliant.

ISO 55000 does not provide guidance on how to achieve maximum value from its physical assets.

As discussed in detail below, the ART process can help an organization achieve ISO 55000.

Additional definitions for terms that do not have acronyms:


Culture change

The process of educating and motivating all people within an organization so they behave in accordance with the goals of the organization. From an ART perspective, the goal is to make people aware of how the change will benefit them and how they can contribute to the change.

While culture change begins at the top and thus begins in the VALUE phase, culture change is the focus of the PEOPLE phase.

Roles and responsibilities

It is essential that people clearly understand what they are responsible for and how they can measure the success of the work they perform. If there is any confusion, it becomes terribly difficult to understand how to prioritize. It is also important to understand who must be informed when changes are made or progress achieved, and who is accountable for a task (and they must see that people are held accountable when they fail to perform their task).

Roles and responsibilities live in the PEOPLE phase, as it is one of the ingredients of culture change.


Criticality analysis

The process of assessing an asset’s consequence of failure, the asset’s likelihood of developing a fault, and the likelihood of detecting the fault. See “Asset Criticality Ranking” above for more information.

Preventive maintenance

The definition of preventive maintenance changes in different parts of the world. In some parts of the world, it encompasses interval-based maintenance and condition-based maintenance. When discussed within ARP or ART courses or books, the more common definition is used: see IBM, or Interval-Based Maintenance.

Breakdown maintenance

If the maintenance department is only engaged when equipment fails, then it is said to be engaged in “breakdown maintenance.” It is often confused with the run-to-fail strategy, but there is a significant difference. As described in run-to-fail, there are situations where risks/criticality can be assessed, and it is proven that it is not economically viable to take the steps necessary to prevent failure from occurring. That is a deliberate run-to-fail strategy.

Reactive maintenance

The phrase reactive maintenance is typically synonymous with breakdown maintenance.

Planned maintenance

When a maintenance department utilizes planners and schedulers to prioritize work, develop procedures, and generally implement a structured work management process, the work performed is described as planned maintenance.

Precision maintenance

When shafts of machines are aligned with very little offset and angularity, when rotors are balanced to achieve minimum residual unbalance, when bolts are tightened to the correct tension in the correct order, and when other tasks are performed in such a way that the equipment will achieve the longest life, it is said that that department is engaged in precision maintenance.

Defect elimination

While this term can be used quite broadly, in our context it is the process of proactively eliminating the root causes of equipment failure and poor performance. It begins during the project definition stage and involves project management, equipment selection and design, the procurement process, incoming quality control, spares management, and precision maintenance. We must not import problems into the plant, and we must not introduce problems once the equipment is under our control.

Condition monitoring is reactive – the failure has begun to occur. RCA is reactive – the failure has occurred. Defect elimination is proactive – it seeks to eliminate the onset of failure.

Defect elimination is a key part of the ART strategy and is a key focus of the ACQUIRE phase.


Lean is the basic principle of trying to do more with less. The aim is to reduce waste and make every process more efficient.

Lean is one of the underlying principles of ART.

Lean maintenance

When we apply the Lean principal to the maintenance department, we might call it Lean maintenance. We would argue that there is a great deal of overlap between Lean and the ART process as we seek to reduce waste in maintenance and operations (including the waste associated with downtime, slowdowns, poor changeovers, and poor quality).


Operating context

Often when discussing equipment failure, especially in the context of RCM and FMEA, we must be careful not to treat a component (e.g., a motor, pump, valve, transformer, etc.) in isolation; we must consider the operating context. The context relates to its operating conditions and environment – how it was mounted, whether it is outdoors or indoors, whether it operates 24/7 or infrequently, and so on. The context also relates to the importance of the service it provides; if it fails, will anyone notice, or might it cause a safety incident?

The maintenance strategy must be sensitive to the operating context.


The function of an asset is a clear description of what that asset was deployed to perform. It will have a primary function and secondary functions. The primary function may be “to transfer water from Tank A to Tank B at a flow rate of 100 L per second.” Its secondary functions will include requirements such as “it shall not leak,” “it shall not make so much noise that it wakes the neighbors,” and so on. When we consider how an asset might fail, we must consider whether it has catastrophically failed on the one hand, or “simply” failed to perform its function – in which case we say that it has “functionally failed.”

Functional failure

Every asset has one or more functions to perform (see above). When an asset has functionally failed, it is no longer able to perform that function even though it may still operate (i.e., the shaft of the pump still turns; however, it is unable to develop the necessary pressure and flow).

Catastrophic failure

A failure which results in the equipment ceasing to operate: it is no longer able to operate at all. It may or may not result in severe damage to multiple components.

Collateral damage

The damage that occurs as a result of a failure. For example, a bearing may fail and as a result the shaft is bent. The bent shaft is collateral damage.

Secondary damage

See collateral damage

Failure mode

A failure mode is the way in which an asset (or component) may fail. It is what the operator or mechanical technician might report when they observe the state of the equipment: the bearing seized on the shaft, the valve stuck in the open position, the motor released smoke, the cable burned out, and so on.

In the maintenance and reliability world, we can consider failures at the three levels: the failure mode (the way in which it fails), the failure cause/mechanism (the immediate reason why it failed: wear, impact force, excessive current, etc.), and the root cause of the failure (what did a human do that led to the failure mechanism occurring and why did the human do it).

Failure mechanism or failure cause

These terms are often used interchangeably and are associated with failure modes. Failure modes occur because of the failure mechanism. (Failure mechanisms ultimately occur due to human error or a human decision and can be determined through root cause analysis.) Failure mechanisms typically occur for two reasons:  overstress (i.e., mechanical, chemical, or thermodynamic forces that result in failure within moments) or wear (i.e., mechanical, chemical, or thermodynamic forces that are applied over a longer period of time that cumulatively result in failure). Examples include fatigue, corrosion, stress, fractures, and others.

Infant mortality failure

This rather morbid phrase is used in relation to failures that occur soon after equipment has been installed. Ideally, equipment that is new or recently overhauled should be at its peak of reliability. Sadly, the opposite is often true. Because of human error, it is quite common for mistakes to be made and thus failures will occur.

In reliability terms, there is often a decreasing rate of failure after equipment is installed – as each hour and day of successful operation passes, the likelihood of failure decreases.

The solution is to improve design, maintenance, spares management, and operating practices to reduce the likelihood that equipment will be defective, and to use condition monitoring to detect the onset of failure.

Infant mortality failures are one reason we perform condition monitoring rather than interval-based maintenance. If the interval is not accurately known, we may replace a component/asset that is in perfect health and plunge it into the infant mortality phase – we have just increased the likelihood of failure.

Age-related failure

When certain components have been in service for a period of time, the likelihood of failure grows, typically due to wear or fatigue. For example, the impeller of a slurry pump or the belt on a fan will eventually fail simply because of the length of time they have been in operation. If the nature of the operation does not vary considerably, we may be able to estimate the life of the impeller or belt based on previous experience.

If the equipment is not run continuously, the age may be related to the actual operating hours, or the distance traveled by a vehicle, or the number of production cycles – it is all about the cumulative stress endured by the component.

In reliability terms, there is an increasing rate of failure with these components – once we get to a certain age, as each day passes the likelihood of failure increases.

For these types of components (for these types of failures), we may be able to use interval-based maintenance because the interval is accurately known. If it is not accurately known, we must go back to condition-based maintenance for fear that the equipment may fail before the prescribed interval, or the equipment may be capable of operating for a significant period after the prescribed interval.

Random failure

This potentially confusing phrase is used in relation to failures that occur after the infant mortality phase but before age-related failures may occur. They are called random failures because, statistically speaking, we do not know when they will occur – they appear to occur randomly.

In reliability terms, there is a relatively constant rate of failure – it is no more likely that such a failure will occur after one month than that it will occur after one year.

It is because of random failures that we perform condition monitoring and utilize the condition-based maintenance strategy. We do not know whether failure will occur after one month, one year, five years, or longer; therefore, we monitor the health of the equipment so that we can take the appropriate action when the onset of failure is detected.

It is important to note that while the failures may appear to be random, the rate of failure (the number of failures per year) is affected by our reliability improvement practices. If we do everything described within the ART process, we will drastically reduce the number of random failures we experience each year.

Human error

Human errors are mistakes made by people that may result in premature failure or injury. They occur because people are poorly trained, they do not have the correct tools or procedures, they are being rushed to complete the work, the environment is cramped and possibly dark and dirty, and for other reasons.

Human errors are inevitable, but they can be minimized. If we are serious about improving reliability, we must address the root causes of human error.

Human error management

When we accept that human errors are inevitable, we must then understand all the reasons why human errors occur and then make changes so they are less likely to occur. Changes may include better training, the availability of precision tools, clearly written procedures, improved lighting and work conditions, an awareness by supervisors and others that quality takes time (and thus should not be rushed), and more.

Human error management is necessary to be successful in the DISCIPLINE phase, but it is introduced in the PEOPLE phase.


ART versus Lean, TPM, 5S, 6σ, and CI

The goal of the ART process is to engage senior management and all levels of the staff/workforce so that everyone works toward delivering the greatest value to the organization.

Customer satisfaction, safety and environmental incident reduction, waste reduction, and quality improvement are all key outcomes of the ART process.

What makes ART unique, in addition to the fact that organizations are provided with a detailed step-by-step guide, is the fact that it focuses on the reliability and performance of the organization’s physical assets. By minimizing maintenance costs, eliminating defects, and reducing breakdowns, an organization can maximize capacity and availability and offer their customers a dependable, high-quality product or service.

ART versus ISO 55000

When considering the management of physical assets in relation to their reliability and performance, there is a very close relationship between ART and ISO 55000. Whereas ART provides guidance on how to achieve the goals of asset management, ISO 55000 instead provides a means to assess whether an organization has the correct elements in place. In other words, ART provides a means for an organization to achieve ISO 55000 certification. ISO 55000 is a ruler; ART teaches an organization how to grow.

ISO 55000 can be used to broaden the scope of asset management beyond reliability and performance, whereas ART purely focuses on asset reliability and performance.