Failure Mode Effect (Impact) Analysis (Identify & Reduce)

FMEA is a bottom-up approach to identifying risks that have been designed into a project, product, or process. It is a systematic method to review a design from a component or subsystem level, and rate identified risks relative to each other.  A Risk Priority Number (RPN) is calculated for each failure mode that is identified and characterized. The RPN is a composite number reflecting severity of the effect, frequency of occurrence, and ability to mitigate the effect. It can help identify the likelihood and consequences of a single point of failure. It may not be as effective as other techniques such as Failure Tree Analysis in exploring more complex possibilities with multiple factors occurring simultaneously or conditionally.

FMEA is a risk prioritization tool represented in a matrix format with detail developed for each potential failure mode. Once RPNs are determined, a plan is developed to reduce the risk of failure modes for high RPNs. If a failure mode has a high RPN, then it must be managed effectively. Use the FMEA as the basis for developing a monitoring and control plan.  It is essential to have an active tracking mechanism for high RPN’s to ensure critical risks and corrective actions are ultimately managed effectively by the organization.

The FMEA process can be buffeted by influencing efforts by different roles. This should be expected and addressed. It is recommended to create a guide to work though cases of insufficient data and agree on a rough framework for a consistent understanding. FMEA should be used throughout the project lifecycle starting early during project design.

Start at Element Level: component level, process, or subsystem level, or on the functions they will perform

  • Employ a lightweight and quick version first with FMEA, do not let the process get off track.
  • Risk management is in part about not repeating the same mistakes: review Organizational Process Assets (OPA), and make lessons learned visible.

Identify and Characterize Failure Modes:

  • Consider functional requirements and itemize negations for each function
  • One subsystem can experience multiple failure modes
  • Identifying failure modes is team activity; concise descriptions are best.
  • Time pressures can push-off FMEA work, delayed work may lead to more issues to resolve. It takes time and is necessary to get into details otherwise risks may be missed. Discovery is always better than ignorance.
  • Consider other impact areas: staff productivity, re-work, security, installation, support, additional training, etc.
  • A team applying their skills and talents to a new environment or domain can lead to a broad set of additional risks that must be considered.

Identify Negative Outcomes-Imagine failures and what may occur, and characterize effect (impact)

  • Opportunity to identify new risks.
  • Review initial risk mitigation or safety mechanisms for unforeseen risks

NASA Model: This is an effective model developed for the space program.

  • Local-immediate effects
  • System-impact on system
  • Mission-entire scope impact

Identify Possible Causes (for imagined failures)

  • A root cause analysis to the imagined failure list
  • Check to be sure a correct root cause has been characterized
  • Must prioritize a list of plausible problems
  • Clear goals and quality requirements are essential to this process
  • A diverse set of SME’s will produce the best analysis

Four Frequently Used Categories to consider how the design may not work in all cases:

  • Man: inexperience, operator distraction, poor training
  • Machine: equipment breakdown, delayed maintenance
  • Method: Software errors, poor documentation or procedures, inadequate testing
  • Material: substandard materials, metal fatigue, excess force

Quantify Risk: Severity (Impact) vs. Likelihood of Occurrence (rating system)

  • To prioritize efforts and be better able to decide how to deal with identified risks.
  • Must use a scale that is meaningful to everyone in the organization
  • RPN: Severity rating x Occurrence rating = Risk Priority Number

Risk Mitigation: Generate a list of corrective actions 

Best Practice: Thinking of corrective actions is a creative process. Techniques for exploring a design space described on this site can be effective.

  • Reduce likelihood (of possible cause occurring) or severity; reduce risk for severe outcomes first
  • Think about the timing of corrective actions: during design, manufacturing, operation, repair, etc.
    • Design simulation and analysis leading to change
    • Instructions or training
    • Techniques-quality of process
    • Materials-strength, weight, etc.
  • Cost-benefit analysis: different strategies to decide which to implementTechniques such as a Pugh matrix analysis may help to determine which corrective actions to implement.
    • Start with most severe outcome events (may or may not match RPN)
    • Reducing the likelihood of occurrence
    • Reduce the severity of a negative outcome

Ensure Identified Risks are Managed Effectively: Make the monitor and control plan visible at all levels.

Many times it is not a failure of risk identification. There are unfortunate examples of an organization failing to manage risks effectively. A disaster likely can be avoided when the design requirements are changed, and when a system fails required tests with implementation of good risk mitigation techniques.