Tools

Failure Mode, Effects, and Criticality Analysis

An excerpt from The Handbook for Quality Management (2013, McGraw-Hill) by Paul Keller and Thomas Pyzdek describing the elements of FMECA

Failure Modes and Effects Analysis (FMEA), also known as Failure Modes, Effects and Criticality Analysis (FMECA), is used to determine high risk functions or product features based on the impact of a failure and the likelihood that a failure could occur without detection.

The methodology can be applied to products (Design FMEA) or processes (Process FMEA) as follows (Keller, 2011):

Define the system to be analyzed, including a review of all functions or processes, the current performance levels for each, and a definition of failure of each process. The process and its failure modes were specified in the Define stage; the current level of performance documented in the Measure stage; however, during the Improve stage the process was redefined, so it's possible the new process will have different failure modes. The performance levels will certainly be different, representing the fruits of the improvement effort.

The process map is used to define the steps and functional relationships for the new process.

A proper SIPOC analysis (as discussed in the Define stage, Chap. 13) ensures a thorough understanding of the process and sub-processes.

Step 4 is perhaps the true beginning of the FMEA process within Six Sigma DMAIC projects, since the preceding three steps have already been accomplished and serve as inputs at the Improve stage. In this step, we define the function of the process. The function provides the purpose of the step. Each step should have one or more functions, given that the step is necessary to satisfy an internal or external requirement. To identify the functions of the process step, it might be useful to consider the ramifications of removing the step. For example, in a Sales process, the process step for Enter the Product ID number for each purchased item provides the function to Identify the item numbers that belong to the products being purchased so that they are all included in the delivery.

For each function, identify failure mode & its effect: What could go wrong? What could the customer dislike? For example, the function: Identify the item numbers that belong to the products being purchased so that they are all included in the delivery, the failure modes might be Product ID mistyped and Item numbers not correctly defined for product bundles. The second failure mode refers to products that are sold as sets. A single item number is used for the set so that the proper charge is applied for the set (discounted from the per item prices), but subsequent process steps (and subsequent processes) need the correct item numbers for each piece (such as to check inventory levels or fill the order from inventory).

Define the severity for each of the Failure Modes. Table 16.1 provides a good means of identifying the severity for a given failure effect. In the example given, the failure mode of mistyping the Product ID, with the effect of shipping the wrong product, is given a severity of 6. From Table 16.1, Severity 6 is described as Customer will complain. Repair or return likely. Increased internal costs. Granted, defining a Severity Level is subjective. A severity of 5 or 7 might seem reasonable in this example. There is no one right answer, however consistency between analyses is important for meaningful prioritizations.

Define the likelihood (or probability) of occurrence. Table 16.1 provides useful descriptions of occurrence levels from one to ten. Table 16.2 provides a somewhat better definition, as developed by the Automotive Industry Action Group (AIAG) based on process capability and defect rates. . In the example, the failure mode of mistyping the Product ID, with the effect of shipping the wrong product, is given an occurrence level of 5.

Define the detection method and likelihood of detection. Table 16.1 provides useful descriptions of detection levels from one to ten. In the example, the failure mode of mistyping the Product ID, with the effect of shipping the wrong product, is given an detect-ability level of 4, a likely detection before reaching the customer. This is based on the detection method that has been implemented from past process improvements: the accounting clerk compares the PO with the order form as the invoice is created for shipping.

Calculate Risk Priority Number (RPN) by multiplying the Severity, Occurrence and Detect-ability levels. In the example, the Risk Priority Number is calculated by multiplying 6 (the Severity) by 5 (the Occurrence Level) by 4 (the Detection Level), resulting in an R P N of 120.

Prioritize the Failure Modes based on the RPN.

The Risk Priority Number will range from 1 to 1,000, with larger numbers representing higher risks. Failure modes with higher R P N should be given priority for the Improve stage of DMAIC.

Some organizations use threshold values, above which preventive action must be taken. For example, the organization may require improvement for any RPN exceeding 120. Reducing the RPN requires a reduction in the Severity, Occurrence, and/or Detect-ability levels associated with the Failure Mode. As a general rule:

Reducing Severity requires a change to the design of the product or process. For example, if the process involves a manufactured part, it may be possible to alter the design of the part so that the stated Failure Mode is no longer a serious problem for the customer.

Reducing Detect-ability Level increases cost with no improvement to quality. In order to reduce the Detect-ability level, we must improve the detection rate. We might add process steps to inspect product, approve product, or (as in the example), to double-check a previous process step. None of these activities add value to the customer, and are Hidden factory sources of waste to the organization.

Reducing the Occurrence level is often the best approach, since reducing Severity can be costly (or impossible) and reducing Detect-ability is only a costly short-term solution. Reducing the Occurrence level requires a reduction in process defects, which reduces cost.

The final step in the FMEA is to re-evaluate the RPN after improvements have been implemented.

Severity, Occurrence and Detectability Levels for FMEA

AIAG Occurrence Levels for FMEA

Learn more about the Quality Improvement principles and tools for process excellence in Six Sigma Demystified (2011, McGraw-Hill) by Paul Keller, or his online Green Belt certification course ($499).