Excellent PLC Co.,Ltd

PLC and DCS professional supplier

Redundancy That Reduces Reliability: A Hard Lesson from Honeywell 10008/2/U

Troubleshooting

Redundancy That Reduces Reliability: A Hard Lesson from Honeywell 10008/2/U

Redundancy That Reduces Reliability: A Hard Lesson from Honeywell 10008/2/U

By Eleanor Watkins – Control System Design Auditor


Redundancy is supposed to make systems safer.
But when it is misunderstood, it does the opposite.

This became painfully clear during a system audit involving a Honeywell 10008/2/U communication module configured in a redundant architecture. The system never fully failed — yet operators experienced repeated communication disturbances that were difficult to explain and even harder to trace.


The Architecture on Paper

  • Dual communication paths

  • Automatic switchover enabled

  • Heartbeat-based redundancy logic

  • Seamless failover expected

From a design perspective, everything looked conservative and robust.

In reality, it was fragile.


What Operators Experienced

  • Communication latency spikes

  • Short data freezes lasting 1–2 seconds

  • Spontaneous recovery without alarms

  • Events more frequent during peak traffic

No module fault indicators.
No explicit redundancy alarms.

Just instability.


Why This Was Not a Hardware Failure

The 10008/2/U module never lost power.
It never dropped its link.

What it did was switch roles too often.


The Real Issue: Aggressive Redundancy Thresholds

During the audit, we reviewed redundancy parameters:

  • Heartbeat timeout set extremely low

  • Failover triggered by minor latency variations

  • No hysteresis or minimum hold time

  • Both channels frequently judged “marginal”

In effect, the system interpreted normal load-related delays as failures.

The result was redundancy oscillation.


What Redundancy Oscillation Looks Like

  • Active path switches before traffic stabilizes

  • Buffers flushed mid-transaction

  • Communication sessions reset repeatedly

  • Data remains “valid” but arrives late or out of order

From the outside, it looks like noise.
From the inside, it’s controlled chaos.


Why Diagnostics Missed It

Most diagnostics answer binary questions:

  • Is the link up?

  • Is the module healthy?

They do not ask:

  • Is the system switching too often?

  • Is failover being abused?

The 10008/2/U did exactly what it was told to do — too well.


How We Stabilized the System

Redundancy tuning

Heartbeat_Timeout := Increased
Minimum_Active_Time := Enforced
Failover_Condition := Sustained_Failure_Only

Operational policy

  • Redundancy parameters reviewed after any traffic change

  • Failover events logged and trended, not ignored

  • Redundancy treated as a control function, not a checkbox

After tuning, communication stabilized immediately.

No hardware changes were required.


What This Case Teaches

  1. Redundancy is a control strategy, not a safety net

  2. Fast failover without hysteresis creates instability

  3. Communication load must be considered in redundancy logic

  4. Over-sensitive systems fail more often than tolerant ones


Final Thought

The Honeywell 10008/2/U communication module did not introduce instability.

It revealed it.

Redundancy does not guarantee reliability.
It amplifies the quality of your assumptions.

Eleanor Watkins

Prev:

Next:

Leave a message