Design for Reliability

This article describes strategies to ensure reliability in new products, and describes how and why to consider reliability engineering early in the technology development cycle. Copyright 2022, Saegert Solutions Inc.

Reliability engineering is primarily associated with RAMS (Reliability, Availability, Maintainability and Safety). RAMS originated from the need to repair complex systems when they broke down, evolving to include preventive maintenance planning, (to prevent breakdowns), ensuring adequate numbers of spares, and sophisticated modeling to predict lifetime costs and service intervals for fleet and utility operators, where the costs of outages escalate quickly.

Associating reliability engineering with mature technologies already in service leads many technology developers to conclude that reliability demonstration and growth should only be conducted in the later stages of development, after prototypes begin operations in relevant, or operational environments (TRLs 6-7) and higher, with less consideration given to reliability requirements while technologies are in development (TRLs 2-5).

Often this choice is made by default: technology readiness levels 2 and 3 prioritize analysis and proof-of-concept studies; by levels 4 and 5, resources focus on system integration, and ensuring that the components selected satisfy requirements for form, fit, and function. Prototypes emerge (TRL6), and are characterized to establish performance figures. Drawings are finalized, driving plans for manufacturing. Maybe, the design is ‘frozen’, and under change control at this point. Demonstration fleets are planned, supply agreements are finalized. By TRL7, ‘production’ prototypes are operating in demonstration fleets. As soon as prototypes start failing in the field, reliability necessarily becomes a priority.

From a reliability engineer’s perspective, a large fleet of operating prototypes is a blessing. Reliability runs on data, and large numbers of prototypes operating under a variety of conditions - and with low production variability (thanks, Quality!)- makes for ripe datasets and juicy analyses.

For everyone else, it can be absolute hell: failure reporting, investigations, design and process changes require hundreds of man-hours of unplanned and unbudgeted work to resolve each issue. Often, the most experienced engineers have been reassigned to new projects in development, and investigations are treated as distractions. Timelines tighten, and re-designing components can require existing parts to be reworked or scrapped. It’s not unusual to see costs climb into the hundreds of thousands of dollars, for each of potentially dozens of issues reported.

And those are just engineering costs. If prototype failures occur in the public eye, or catch the attention of regulators, or require fielded parts to be recalled, then costs associated with warranty and poor public perception increase exponentially.

The alternative is to build reliability into the technology development process. Defining system reliability targets in the concept phase (TRL2) ensures that reliability requirements are considered when evaluating key functions during proof of concept (TRL3). Reliability block diagrams are an excellent tool for apportioning reliability requirements to subsystems, and functional analysis can be used to establish reliability requirements for components, alongside those for mass, volume, power, flow, etc. The structure and functional analysis processes described in the 2019 edition of the harmonized AIAG-VDA FMEA handbook can be used to support this approach.

Defining and describing functions can also provide valuable consideration for how the function and performance of components in the system can change over time, a critical concern for reliability. Components subject to wear, should be tested and hardened against wear; components sensitive to extremes, should be tested at those extremes; components subject to age, should be tested after being aged.

Testing components and subsystems (TRL4 and 5) in parallel with system development provides information that can prove invaluable in later stages. Components can be tested in statistically significant sample sizes under conditions exceeding those tolerated by the complete system. Units can be aged, overstressed, failed, replaced and investigated, while generating data on failure modes, causes and occurrence rates, compared with initial allocations. Component failure data is also a crucial resource when system failures occur: baseline performance data helps define the problem and identify causes when investigating failures.

With dedicated component test programs, system level testing is freed to focus on issues that can only be examined at the system level, such as those related to integration, operation or control. Failing components can be quickly swapped out of the system to preserve system uptime, or aged components swapped in, to assess system performance at end-of-life. In a system-only test scenario, testing stops and starts with every component failure, as teams struggle to focus on ‘design intent’ and demonstrating ‘core functions’ while investigating multiple failure modes as they occur. Because probabilities of occurrence are less than 100%, failure modes occur in field service that were not observed in system testing, leading to situations described at the outset of this article.    

In the course of my career, I’ve experienced both scenarios. What I have observed is that often, companies developing new technologies operate as start-ups, with technical leaders who are experts in their core technologies, lacking similar experience in product development, engaging reliability professionals only when reliability becomes a concern.

It’s an approach that seems valid, particularly if a startup’s strained resources make a full-time reliability engineer difficult to justify, while most RAMS-style solutions would be lost without data. Consultants who understand how to reliability is built into products, will provide guidance and structure tailored to support technology development at each readiness level, and secure the knowledge required to ensure systems are capable of safe and reliable operation when they enter service. Reach out to Saegert Solutions and discover how we can help you.    

About the author: Alex Saegert is founder and principal consultant of Saegert Solutions. He is an ASQ certified reliability engineer (CRE) and supplier quality professional (CSQP), with a specialized ASQ credential in risk management. He has over 20 years experience engineering quality, reliability and safety into products from such diverse fields as medical devices, hydrogen fuel cells, alternative energy powertrains for cars, trucks and locomotives, and in the nuclear power industry. He is licensed as a professional engineer in the provinces of Alberta and British Columbia, Canada.

Previous
Previous

10 tips: Medical Device Risk Management

Next
Next

Medical Device Risk Management P1 and P2 Explained