Most organizations treat problems the same way: something breaks, someone patches it, and everyone moves on, until the same issue resurfaces a week later. That cycle of firefighting wastes time, money, and morale. A structured root cause analysis process strips away symptoms and gets to the actual origin of the problem, so you can fix it once and move forward.
At Lean Six Sigma Experts, we’ve spent over a decade helping companies replace guesswork with engineering-driven, data-backed problem solving. Root cause analysis sits at the core of what we teach in our training programs and what we implement during consulting engagements. It’s one of the most practical skills a team can develop, whether you’re running a manufacturing floor, managing a supply chain, or leading continuous improvement initiatives across multiple sites.
This guide walks you through each step of the root cause analysis process, from defining the problem to verifying your corrective actions stick. You’ll also find the most effective tools and techniques, like the 5 Whys, fishbone diagrams, and fault tree analysis, along with real examples that show how they work in practice. By the end, you’ll have a clear framework you can apply immediately, no matter your experience level.
What root cause analysis is and when to use it
Root cause analysis (RCA) is a structured problem-solving method that identifies the fundamental reason a failure or defect occurred, rather than just neutralizing its visible symptoms. The root cause analysis process works by systematically collecting evidence, mapping cause-and-effect relationships, and applying specific tools to trace a problem back to its origin. When you address the root cause instead of the symptom, the problem stops recurring and your team stops wasting time on the same issues every quarter.
Fixing a symptom without finding the root cause is like unplugging a smoke alarm instead of putting out the fire.
What counts as a root cause
Not every cause you identify during an investigation qualifies as a root cause. A root cause is the deepest, most fundamental factor in a cause chain that, if corrected, would prevent the problem from happening again. Many teams stop too early and fix a contributing cause, which reduces the problem temporarily but never eliminates it completely.

Here is a concrete example: a machine repeatedly overheats. A technician replaces the cooling fan, and the machine runs fine for two weeks before overheating again. The fan was not the root cause, it was a symptom. The root cause turns out to be a blocked air intake caused by dust accumulation because no preventive maintenance schedule existed. Fix the maintenance gap, and the overheating stops for good. That distinction, between a contributing factor and the true source, is what separates effective RCA from guesswork.
When you should run a root cause analysis
RCA is not a tool you apply to every minor inconvenience. You use it when the cost of recurrence outweighs the time invested in a thorough investigation. The situations below are clear signals that a full root cause analysis is the right move:
| Trigger | Why it warrants RCA |
|---|---|
| Recurring failures | The same defect or breakdown keeps appearing despite repeated fixes |
| High-impact events | A safety incident, major quality escape, or significant financial loss |
| Customer complaints | A defect passed through your process and reached the end user |
| Unexplained performance drops | Output, yield, or efficiency declines without an obvious cause |
| New process failures | A recently launched process or product produces unexpected results |
You do not need an RCA for a one-time, low-stakes issue that is clearly understood and easily corrected. But if a problem has appeared more than once, costs your team significant time, or creates downstream consequences for customers or other departments, a structured analysis will pay for itself many times over.
Who should be involved
RCA is not a solo exercise. The most effective investigations bring together people who are closest to the process, including operators, engineers, supervisors, and anyone with direct knowledge of what happened. Cross-functional involvement reduces blind spots and ensures you capture critical details that only certain team members hold.
Your team also needs someone to lead the analysis with discipline, keeping everyone focused on evidence rather than assumptions or blame. In Lean Six Sigma environments, that role typically falls to a Green Belt or Black Belt trained to apply the right tools, ask the right questions, and drive the investigation to a defensible conclusion.
Step 1. Define the problem and set the scope
The entire root cause analysis process stands or falls on how well you define the problem at the start. A vague problem statement like "quality is bad" gives your team nothing actionable to work with, while a specific, well-scoped statement points everyone in the right direction from day one. Skipping this step or rushing through it is the single most common reason RCA investigations stall or produce conclusions that don’t hold up under scrutiny.
Write a problem statement that actually works
A strong problem statement describes what went wrong, where it happened, when it was first observed, and how significant the impact is. It sticks to observable facts and avoids jumping to causes or assigning blame before any evidence is collected. Think of it as a precise, factual description of the gap between what should be happening and what is actually happening.
A good problem statement describes reality without editorializing. No causes, no suspects, just documented facts.
Use this template to build yours:
| Field | What to document |
|---|---|
| What | The specific defect, failure, or undesired outcome |
| Where | The location, machine, process step, or department |
| When | When the problem was first observed and how often it recurs |
| How big | Quantified impact: units affected, cost incurred, or downtime hours |
Here is a completed example using that same template:
- What: Weld failures on Product A
- Where: Assembly Line 3, Station 7
- When: First reported March 14; occurring on approximately 12% of units since
- How big: 47 rejected units per shift, costing an estimated $2,300 per day in rework
Set the scope before you go further
Once you have a solid problem statement, define the boundaries of your investigation so the team knows exactly what is in scope and what is not. Without boundaries, investigations expand in every direction and lose focus quickly.
Your scope should specify the time window you will examine, the process steps included, and the specific product lines, machines, or locations under review. If weld failures appear only on Line 3 during the night shift, limit your initial investigation there. If data later points elsewhere, you can expand, but starting with a tight scope keeps the team focused and prevents the investigation from becoming unmanageable before you have even gathered your first piece of evidence.
Step 2. Gather evidence and map what happened
Before you start generating theories about what caused a problem, collect the facts first. Teams that skip this step and jump straight to brainstorming causes tend to anchor on assumptions early, which biases the entire investigation. Your job at this stage is to build a complete, chronological picture of what happened based on observable evidence, not opinions about what might have happened.
Collect the right types of evidence
Strong evidence collection covers four categories: physical evidence, data records, process documentation, and direct observations. Physical evidence includes failed parts, samples, or photographs taken as close to the event as possible. Data records include production logs, quality inspection reports, machine sensor readings, and maintenance histories. Process documentation shows you what the process was designed to do, so you can identify deviations. Direct observations come from operators, technicians, or supervisors who witnessed the problem firsthand.
Collect evidence before the scene changes. Once a machine is repaired or a process restarts, critical physical and contextual clues disappear permanently.
Use this checklist to make sure your team covers all four categories:
| Evidence Type | What to collect |
|---|---|
| Physical | Failed parts, scrap samples, photographs of the defect |
| Data records | Production logs, inspection reports, sensor data, maintenance history |
| Process documentation | SOPs, work instructions, control plans, design specs |
| Direct observations | Operator interviews, supervisor accounts, shift notes |
Build a timeline of events
Once you have your evidence, organize it into a [chronological timeline](https://leansixsigmaexperts.com/how-to-do-root-cause-analysis/) that shows exactly what happened and in what order. A timeline reveals gaps in your knowledge, highlights process deviations, and shows whether the problem appeared suddenly or gradually over time. Both patterns point to very different root causes.

Your timeline does not need to be complex. A simple table with a timestamp, event description, and evidence source for each entry gives your team a shared reference point that keeps the investigation grounded in facts throughout the root cause analysis process. Here is a working template:
| Time/Date | Event | Evidence Source |
|---|---|---|
| March 14, 06:15 | First weld failure reported by operator | Shift log |
| March 14, 08:00 | Quality inspection confirms 12% rejection rate | QC inspection report |
| March 13, 22:00 | Welding wire spool replaced by night shift | Maintenance record |
Fill in each row as your team uncovers information, and flag any time gaps where the sequence of events is still unclear. Those gaps become priority investigation points in the next step.
Step 3. Generate causes with the right RCA tools
With your timeline built and evidence in hand, you’re ready to generate possible causes. This is where the root cause analysis process shifts from pure observation into structured reasoning. The goal is not to brainstorm randomly but to use purpose-built tools that map cause-and-effect relationships in a disciplined way. Pick your tool based on the complexity of the problem and the type of evidence your team collected.
Use the 5 Whys for focused problems
The 5 Whys is the fastest way to trace a simple, linear cause chain. You start with your problem statement and ask "why" repeatedly until you reach a cause that, if corrected, would prevent the problem from recurring. Five iterations is a guideline, not a rule; some chains resolve in three steps, others take seven.
The 5 Whys works best when a single clear chain of causation exists. If multiple independent branches appear early on, switch to a fishbone diagram instead.
Here is a working example using the weld failure introduced in Step 1:
| Why | Answer |
|---|---|
| Why did the welds fail? | Weld penetration was too shallow |
| Why was penetration too shallow? | Wire feed rate was set below spec |
| Why was the feed rate set low? | Operator adjusted it to reduce spatter |
| Why did spatter increase? | A different wire spool batch was loaded |
| Why was a different batch used? | No incoming inspection process existed for wire batches |
Root cause identified: No incoming inspection process for welding wire.
Use the fishbone diagram for complex problems
When a problem has multiple potential causes across different areas of your process, a fishbone diagram gives your team a visual structure to organize them systematically. The head of the fish holds your problem statement, and the bones represent major cause categories: Man, Machine, Method, Material, Measurement, and Environment (the 6Ms).

Build the diagram during a facilitated team session by listing each category as a branch, then populate specific potential causes under each one using the evidence you gathered in Step 2. Once complete, your team will have a prioritized shortlist of cause candidates to carry forward into the validation step. Flag causes with direct evidence first; those become your highest-priority targets for testing.
Step 4. Validate the root cause with data
Identifying a probable root cause during a brainstorming session is not the same as proving it. Before your team commits time and budget to a corrective action, you need to verify that the cause you identified actually drives the problem you documented in Step 1. Skipping validation is one of the most expensive mistakes teams make, because implementing a fix for the wrong cause wastes resources and leaves the actual problem in place. This step in the root cause analysis process separates a defensible conclusion from an educated guess.
Validation is not about doubt; it is about making sure the data confirms what your team believes before you spend money acting on it.
Test your cause-and-effect hypothesis
Your goal in validation is to confirm that removing or changing the suspected root cause eliminates or significantly reduces the problem. The most reliable way to do this is through a controlled test. Introduce the suspected cause in a controlled setting and measure whether the problem appears, then remove it and measure again. If the defect rate or failure frequency tracks directly with the presence of your cause, your hypothesis holds.
Use this simple validation template to structure your test:
| Validation Field | Your Entry |
|---|---|
| Suspected root cause | No incoming inspection process for welding wire batches |
| Test condition A (cause present) | Run 50 units with uninspected wire from the new batch |
| Test condition B (cause removed) | Run 50 units with wire that passed incoming inspection criteria |
| Metric to measure | Weld rejection rate per 50-unit run |
| Result A | 11 rejected units (22% failure rate) |
| Result B | 1 rejected unit (2% failure rate) |
| Conclusion | Cause confirmed; inspection gap directly drives failure rate |
Review the data before you close the investigation
Once your test is complete, review the full dataset rather than relying on a single run. One favorable result can reflect normal process variation rather than a genuine cause-and-effect relationship. Run your test across multiple shifts, operators, or machines if your process allows it, and check whether the outcome stays consistent. If your data shows a statistically significant difference between the two conditions, you have confirmed your root cause and your team is ready to build a permanent corrective action in Step 5.
Step 5. Fix the system and prevent recurrence
Confirmed root cause in hand, your team now has one job: build a corrective action that eliminates the source of the problem and prevents it from returning once you walk away from the project. Most teams stop at the fix itself, but a complete root cause analysis process requires you to also lock in controls that make it difficult for the same failure to resurface when attention moves to the next priority.
Build a corrective action that targets the root cause
Your corrective action needs to address the root cause directly, not a layer above it. If your validated root cause is a missing inspection process, the corrective action is creating and implementing that inspection process, not retraining operators on how to handle substandard wire after it reaches the floor. Actions that treat symptoms will show short-term improvement and then fade.
A corrective action that does not directly remove or control the root cause is just a better patch.
Use this template to document your corrective action before implementation:
| Field | Your Entry |
|---|---|
| Root cause | No incoming inspection process for welding wire batches |
| Corrective action | Create an incoming inspection checklist and hold point for all wire deliveries |
| Owner | Quality Engineer, Line 3 |
| Target completion date | Within 10 business days |
| Success metric | Weld rejection rate at or below 2% for 30 consecutive production days |
Assign a single owner for each action item so accountability is clear. Shared ownership without a named lead almost always results in delays or incomplete execution.
Lock in the fix with controls and documentation
Once your corrective action is in place and hitting your success metric, update your process documentation to reflect the new standard. This means revising your SOPs, work instructions, control plans, and any training materials that govern the affected process. If the documentation does not change, the next operator or supervisor who touches that process has no way to know the new standard exists.
Beyond documentation, add a control mechanism that detects the failure condition early if the root cause ever reappears. In the wire example, that means including the incoming inspection step as a required hold point in your production control plan, verified during internal audits. Controls turn a one-time fix into a durable systemic change that holds without constant manual attention.

Bring it home
The root cause analysis process covered in this guide gives you a repeatable system that moves from a vague problem to a confirmed, data-backed corrective action in five concrete steps. You define the problem, gather evidence, generate causes with the right tools, validate your findings with data, and lock in a fix that holds. Each step builds on the last, so skipping one puts the whole investigation at risk.
Applying this framework consistently is what separates teams that solve problems once from teams that fight the same fires every quarter. Discipline in the process is what produces durable results, not the tools alone. The 5 Whys and fishbone diagram are only as effective as the team using them.
If you want expert support applying these methods inside your organization, connect with our team at Lean Six Sigma Experts to talk through what a hands-on consulting or training engagement would look like for your operation.
