Designing for Safety Systems: Fail-safes & Redundancy

Sometimes engineering requires the consideration of the specific working environments such as certain industry, transport, military and aerospace due to dangerous chemicals, harsh dirty geography or more prominently where human life could potentially hang in the balance.
The hazards of the specific environment call for different considerations when designing systems. It is very important to note that when we talk about redundancy implementation it does serve a purpose, as appose to just adding it superfluously.
Redundancy is usually when two of the same component in the system work in parallel with each other, this protects against total system failure or domino effect failures because of a single failure mode in one of the components upstream. A server RAID system is a good example of this.
It is important to highlight failure modes within the system in order to identify the need and method for safe failure, or decreasing the likelihood of failure through redundancy.
An excellent example that can be used is a simple potentiometer based sensor such as the ones used in vehicle accelerator pedals.

 
These sensors are an important part of the engine management system and so have redundancy built into them in the form of a double resistive wiper. These sensors are designed to take a beating from powerful human leg muscles and work for at least the lifetime of the vehicle so they are also ruggedized by a thick plastic casing. 
The main point of this post will be about the trade-offs that occur when implementing these systems and when they should be used.
Despite it being important to vehicle safety the pedal sensor still uses contacts rather than solid state components which raises an important point about the reality of engineering and how budget must still be considered.
Unfortunately there is a trade-off so it is very important to get the mix right.
Let’s consider a normal non-redundant potentiometer circuit.




A simple power supply going into the potentiometer and giving a percentage output of the input voltage.
See my circuit help index page for help with the concepts behind these circuits.
Now let’s consider the implementation of redundancy as in the pedal sensor mentioned earlier.

In the circuit diagram above we start to see the trade-offs in complexity VS system safety, this circuit buffers the input, then uses an adder to give an average of the two sensors that should give the same voltage output. If a potentiometer fails, then the voltage will most likely be floating around at some unknow voltage state.
What if we could control the failure mode of the wiper which is the most likely point of failure, as it is a mechanical system and will degrade over time as it grazes against the resistive element?

I have added pull down resistors to the wipers of the potentiometers so know we can be sure that the wipers will fall to ground if they fail. Now the average output will always fall to half of the sensor output. So now rather than giving an unpredictable output, it is possible to design any subsequent circuitry with these failure modes in mind. It is all about predictability and control when designing these systems but as you can see it is getting progressively more complex...


So it’s all well and good if your system fails safe, but what if your design is good enough that it is not immediately detectable to the user, what if you need failure mode sensing?


This circuit now senses when the wipers are shunted to zero in the event that the wipers break, I have also designed the system output voltage to not go too close to ground as part of its nominal operation.
And yet again our circuit gets more complex as we try to add failure sensing circuitry to the mix. But with a circuit this complicated we need to start considering if it is worth doing or not?
What if the ICs fail, what are their respective lifetimes and are they working well within their specified limits?
Are you making a more complex system that has a higher likelihood of failing, is it worth doing at all if the MTBF (Mean Time Between Failure) is higher now than it was before?
Is it worth implementing if it fails more but safely, or less but unpredictably, this will depend on circumstances and consequences?


So to conclude, the feasibility of redundancy and fail safe implementation should be considered with respect to the requirements of the system, what environment it works in and how it could affect human life. Are there are trade-offs in price, system complexity and the potential to make the system fail more but what you must consider is if it is worth doing if it fails safe? (hence the name).