
By Kevin Moore – Shift Control Engineer
Night shifts teach you to trust patterns.
That’s why this one bothered me.
02:43 – The System Was “Running”
No alarms.
No watchdog trips.
No communication loss.
But outputs didn’t behave like they usually did.
A valve closed slower than expected.
A calculated value drifted — just slightly.
Enough to notice. Not enough to panic.
03:05 – Checking the Usual Things
I checked:
-
Field wiring
-
I/O status
-
Network latency
-
Controller load
Everything looked normal.
The Honeywell 10012/1/2 CPU module was in RUN, healthy by every visible metric.
And yet the logic didn’t feel right.
03:27 – Reboot Changed Everything
We scheduled a controlled restart.
After reboot:
-
The drift disappeared
-
Timing returned to normal
-
Outputs behaved as expected
Same hardware.
Same program.
Different behavior.
That’s when I suspected memory.
What We Later Learned About This CPU
The controller lived in a bad place:
-
Shared power bus
-
Frequent short-duration outages
-
No full power loss — just dips
Enough to stress flash.
Not enough to trigger obvious failures.
Flash Bit Flips Don’t Announce Themselves
In the 10012/1/2, flash memory holds:
-
Application code
-
Constants
-
Configuration parameters
A single flipped bit doesn’t crash the CPU.
It changes behavior.
Quietly.
Why Diagnostics Didn’t Catch It
-
No checksum validation during runtime
-
No memory scrubbing mechanism
-
No alarm threshold for “almost wrong”
From the system’s perspective, the logic was valid.
From reality’s perspective, it wasn’t.
How We Confirmed It Later
In daylight, engineering compared:
A single constant value differed.
One bit.
That was enough.
Corrective Actions
-
CPU module replaced
-
Power supply isolated and stabilized
-
UPS added specifically for controller rack
-
Post-restart verification added to night-shift checklist
What I Took Away From That Night
-
Flash errors don’t always break systems
-
Subtle behavior changes matter
-
Reboots can mask deeper problems
-
Power quality affects memory integrity
End of Shift
By morning, everything looked fine again.
But I logged it anyway.
Because in control systems,
the most dangerous failures are the ones that almost don’t happen.
— Kevin Moore
Excellent PLC
