Bad Design is like a virus: A look at Latent Errors

When it comes to implementing solutions, problem solvers often prioritize the usability and performance of a product or interface, focusing on intuitive layouts and achieving specific user goals (like maximizing clicks on a website). However, an equally crucial consideration for designers is the potential for their designs to fail under non-optimal conditions, a factor often overlooked in discussions of design and user experience.

Typically, problem solving efforts concentrate on optimizing performance within what is known as the Optimal Design Domain (ODD) — the normal circumstances of use. Yet, evaluating the goodness of a solution should also involve examining how well a product or system performs when faced with unexpected conditions that lie outside the ODD. This broader perspective helps in creating solutions that are more reliable and resilient.

 When failures occur, problem solvers often seek causal explanations, typically attributing them to user behavior or immediate system states — known as active failures. However, there are latent failures as well, originating from earlier decisions made by managers, engineers, and designers during the design phase. These latent failures, often unnoticed until they contribute to a failure, highlight the importance of foresight in design decisions in the implementation step of problem solving.

In 2000 James Reason, a Psychologist and Human Performance Expert, introduced the Swiss Cheese Model to illustrate how system failures can caused by the alignment of latent errors across multiple layers of a system. Each layer represents a defense against failure, but when these defenses are penetrated simultaneously, failures occur. The sinking of the Titanic serves as a poignant example; It was not merely the collision with the iceberg, but the decisions of routing, speed, and design of the ship that collectively led to disaster.

For problem solvers, the Swiss Cheese Model underscores the critical role of design in preventing failures. A well-designed system should ideally thwart failures at every layer, ensuring robustness and reliability across diverse scenarios. However, achieving this level of resilience is challenging, particularly when considering latent errors that may manifest under unexpected conditions.

James Reason also created an “resident pathogen” metaphor to understand how design failures can exist and propagate failures within a system. Just as pathogens accumulate in an organism to cause disease, latent design defects within an organization increase the likelihood of failures. This metaphor encourages designers to minimize these defects through rigorous design practices and proactive risk management.

Addressing latent failures requires strategies beyond traditional operator training or error detection. While these measures are essential for normal operations, they often fail to prevent latent errors that manifest outside the ODD. Instead, effective approaches include situation-based training, simulations, and adaptive situational modes (ASMs) that prepare operators to handle unforeseen scenarios and maintain system integrity.

Situation-based training, exemplified by airline pilot simulations, focuses on preparing operators for rare but critical scenarios. Such training reinforces decision-making patterns that are crucial when systems deviate from normal conditions. Similarly, simulations such as Monte Carlo simulations allow designers to test systems under varied conditions, revealing potential latent flaws.

Moreover, adaptive situational modes (ASMs), seen in critical professions like air traffic control, provide predefined protocols for handling unusual situations. These modes acknowledge the inevitability of failures outside the ODD and emphasize proactive responses to maintain safety and functionality.

Conversely, ineffective remedies like adding excessive safeguards or relying solely on user training can inadvertently complicate systems, increasing their opacity and potential for failure. The Boeing 737 MAX crashes serve as a stark reminder of how well-intentioned safeguards can backfire if they mask underlying design flaws.

Looking ahead, as technology continues to evolve with advancements like AI and autonomous systems, the complexity and opacity of systems will only increase. Designers must thus focus on reducing latent errors and fostering transparency between user actions and system responses. This requires cross-disciplinary collaboration to integrate best practices across design, engineering, and management.

While problem solvers and designers strive to optimize performance within the ODD, they must also anticipate and mitigate latent failures that threaten system reliability. By adopting proactive strategies such as situation-based training, simulations, and adaptive situational modes, designers can minimize the impact of latent errors and enhance the resilience of their designs. This approach not only improves immediate system performance but also prepares organizations for the challenges posed by future technological advancements.

Michael Parent

Michael Parent is CEO of the Problem Solving Academy and author of “The Lean Innovation Cycle” a book that explores the intersection of Problem Solving, Lean and Human Centered Design. Throughout his career, Michael has coached executives through strategic problem solving, strategy, and operations management and has led numerous projects in a variety of industries.

Previous
Previous

Detecting Errors: Separating the noise from the meaningful

Next
Next

Signs don’t work, here’s what to do instead