EVC:
Case Studies
Troubleshoot a bench-top laboratory instrument that stopped working at random times.
A New Hampshire
company, which shall remain nameless, had recently designed the
next-generation of a particular piece of pharmaceutical laboratory
equipment for a worldwide market. This is a 45 pound
microprocessor controlled piece of bench-top equipment with a
user-friendly touch-screen on the front.
Limited production was begun and about fifty or so units were delivered to North America and overseas. All these units had successfully passed all in-house production tests before shipping. Shortly after the first several units were shipped, reports started coming in that some units were refusing to respond to touch-screen commands. They appeared to be completely frozen-up. Sometimes they would come back to life after anywhere from a few minutes to a few hours. Some would not come alive at all unless they were unplugged and re-started, but even then they would soon hang up again.
Production as well as engineering was initially un-able to duplicate this behavior. The reference unit in engineering operated flawlessly. Eventually two units were shipped back from users for diagnosis and rework, one from the US and one from overseas. The "US" unit arrived first and was tested. It did indeed hang at random times and at equally random times it would begin functioning again. Engineering proceeded to investigate:
There did not seem to be any rhyme or reason to when the unit would fail. Perhaps there was electrical noise somewhere. After some extensive measurements it was determined that the AC-to-DC power-supply output was a bit noisy. Disconnect the power-supply and power the unit from a lab power supply: It now worked flawlessly, never hung up, and ran for days. OK, replace the power supply with a new one of the same kind.
The unit ran flawlessly for a week straight, perhaps the problem had been found. In the meantime a few other units had been returned from the field. These all had the power-supply replaced and the units re-tested. Most passed, a couple failed.
Replace the power-supply again in these units. Now all pass production test. Hmm, must be a problem with noisy power-supplies then.
Soon some, but not all, of the repaired units were failing in the field.
Next attempt at a fix was to install a shielding cage around the power-supply. The "US" unit still worked flawlessly after installing the shield.
Now one of the engineers got clever and put the original noisy power-supply back in the "US" unit. The unit ought to fail now, right? It didn't! It ran perfectly for at least a week or so.
Now what? Can't duplicate the problem anymore, what's up with that.
Well, maybe it's a firmware code problem or maybe it's a cable noise problem. Many theories were put forth. Lot's of swapping of cables and sub-assemblies between units, nothing conclusive emerged. The code was analyzed and many tests were run, none of them conclusive.
The units that were failing overseas were said to hang before they even finished booting up, maybe the line frequency had something to do with it? A 50 Hertz 240VAC power source wasn't readily available, so hold that thought for a while.
An "overseas" unit eventually appeared and was tested on US standard 60 Hertz power. It almost always failed before it even finished booting up.
By now engineering was tearing its collective hair out, management was in a tizzy and the distributors were disgusted.
To re-cap: The "US" unit now works flawlessly and the "overseas" unit is useless.
In desperation the engineering manager calls EVC:
I bring the units to the EVC lab for tests. First I confirm that the "overseas" unit hangs immediately or within minutes. Second, it appears I can’t get the "US" unit to fail at all.
Hmm, the "overseas" unit hangs at random times in the operating cycle, often so early that the touch-screen is only partially painted. Unlikely to be a firmware problem then, hardware is a more likely cause. Lots of oscilloscope work follows.
The "US" unit signals look good all-around, but the "overseas" unit has a lot of cyclic noise on the power and digital data lines. Odd...
At some point I notice that when I push on the oscilloscope probe in a certain spot on one of the circuit boards the "overseas" unit gets a nice and clean signal and it doesn't hang until the noise comes back when I remove the probe. It turns out that the signal cleans itself up when I push in any of several places on the circuit board. Perhaps there is a crack in one of the copper traces? Well, it turns out there actually is a crack, but it’s in a trace that only goes to a test-point, so not relevant here.
Next I discover that the unit also works when I push in some places on the chassis, and when I push on the power-supply or on one of the relays. What's going on here?
The commercial power-supply in the unit is built on a circuit board and the four mounting holes are through-plated holes in the corners of the board. Eventually I realize that the lower left power-supply mounting screw is missing and that when I press on that corner the signal is clean and the unit works!
A close look at the power-supply reveals that the four mounting holes are also chassis grounding points, but strangely they are not connected to each other! Some quick out-off-the-unit tests reveal that the power supply is quite noisy if any one of these mounting holes is not grounded. A look at the power-supply manufacturer’s app-notes reveals that item #14 states in part: "To meet emissions specifications, all four mounting hole pads must be electrically connected to a common metal chassis."
The power-supply manufacturer is relying on properly grounded chassis mounting screws in the application for the unit to meet spec!
The "US" unit that always works has all four screws installed. No wonder it works.
So to prove the point I go to remove the screw on the working unit and install it on the failing unit. Hmm, how do I do that? There's not very good access... The only way to remove that screw involves removing a whole lot of hardware and then using an extra long T-handle ball-end Allen wrench. The access angle is so awkward that to re-install the screw one has to glue or tape it to the end of the Allen key. If you can get the screw started on the first try all is well, otherwise you have to go through the tape or glue exercise all over again.
I eventually drilled a whole in the chassis to get straight access to the screw.
I went and spoke with the client’s production personnel and with the engineer who had run the power-supply tests.
See below for conclusions.
Limited production was begun and about fifty or so units were delivered to North America and overseas. All these units had successfully passed all in-house production tests before shipping. Shortly after the first several units were shipped, reports started coming in that some units were refusing to respond to touch-screen commands. They appeared to be completely frozen-up. Sometimes they would come back to life after anywhere from a few minutes to a few hours. Some would not come alive at all unless they were unplugged and re-started, but even then they would soon hang up again.
Production as well as engineering was initially un-able to duplicate this behavior. The reference unit in engineering operated flawlessly. Eventually two units were shipped back from users for diagnosis and rework, one from the US and one from overseas. The "US" unit arrived first and was tested. It did indeed hang at random times and at equally random times it would begin functioning again. Engineering proceeded to investigate:
There did not seem to be any rhyme or reason to when the unit would fail. Perhaps there was electrical noise somewhere. After some extensive measurements it was determined that the AC-to-DC power-supply output was a bit noisy. Disconnect the power-supply and power the unit from a lab power supply: It now worked flawlessly, never hung up, and ran for days. OK, replace the power supply with a new one of the same kind.
The unit ran flawlessly for a week straight, perhaps the problem had been found. In the meantime a few other units had been returned from the field. These all had the power-supply replaced and the units re-tested. Most passed, a couple failed.
Replace the power-supply again in these units. Now all pass production test. Hmm, must be a problem with noisy power-supplies then.
Soon some, but not all, of the repaired units were failing in the field.
Next attempt at a fix was to install a shielding cage around the power-supply. The "US" unit still worked flawlessly after installing the shield.
Now one of the engineers got clever and put the original noisy power-supply back in the "US" unit. The unit ought to fail now, right? It didn't! It ran perfectly for at least a week or so.
Now what? Can't duplicate the problem anymore, what's up with that.
Well, maybe it's a firmware code problem or maybe it's a cable noise problem. Many theories were put forth. Lot's of swapping of cables and sub-assemblies between units, nothing conclusive emerged. The code was analyzed and many tests were run, none of them conclusive.
The units that were failing overseas were said to hang before they even finished booting up, maybe the line frequency had something to do with it? A 50 Hertz 240VAC power source wasn't readily available, so hold that thought for a while.
An "overseas" unit eventually appeared and was tested on US standard 60 Hertz power. It almost always failed before it even finished booting up.
By now engineering was tearing its collective hair out, management was in a tizzy and the distributors were disgusted.
To re-cap: The "US" unit now works flawlessly and the "overseas" unit is useless.
In desperation the engineering manager calls EVC:
I bring the units to the EVC lab for tests. First I confirm that the "overseas" unit hangs immediately or within minutes. Second, it appears I can’t get the "US" unit to fail at all.
Hmm, the "overseas" unit hangs at random times in the operating cycle, often so early that the touch-screen is only partially painted. Unlikely to be a firmware problem then, hardware is a more likely cause. Lots of oscilloscope work follows.
The "US" unit signals look good all-around, but the "overseas" unit has a lot of cyclic noise on the power and digital data lines. Odd...
At some point I notice that when I push on the oscilloscope probe in a certain spot on one of the circuit boards the "overseas" unit gets a nice and clean signal and it doesn't hang until the noise comes back when I remove the probe. It turns out that the signal cleans itself up when I push in any of several places on the circuit board. Perhaps there is a crack in one of the copper traces? Well, it turns out there actually is a crack, but it’s in a trace that only goes to a test-point, so not relevant here.
Next I discover that the unit also works when I push in some places on the chassis, and when I push on the power-supply or on one of the relays. What's going on here?
The commercial power-supply in the unit is built on a circuit board and the four mounting holes are through-plated holes in the corners of the board. Eventually I realize that the lower left power-supply mounting screw is missing and that when I press on that corner the signal is clean and the unit works!
A close look at the power-supply reveals that the four mounting holes are also chassis grounding points, but strangely they are not connected to each other! Some quick out-off-the-unit tests reveal that the power supply is quite noisy if any one of these mounting holes is not grounded. A look at the power-supply manufacturer’s app-notes reveals that item #14 states in part: "To meet emissions specifications, all four mounting hole pads must be electrically connected to a common metal chassis."
The power-supply manufacturer is relying on properly grounded chassis mounting screws in the application for the unit to meet spec!
The "US" unit that always works has all four screws installed. No wonder it works.
So to prove the point I go to remove the screw on the working unit and install it on the failing unit. Hmm, how do I do that? There's not very good access... The only way to remove that screw involves removing a whole lot of hardware and then using an extra long T-handle ball-end Allen wrench. The access angle is so awkward that to re-install the screw one has to glue or tape it to the end of the Allen key. If you can get the screw started on the first try all is well, otherwise you have to go through the tape or glue exercise all over again.
I eventually drilled a whole in the chassis to get straight access to the screw.
I went and spoke with the client’s production personnel and with the engineer who had run the power-supply tests.
See below for conclusions.
Conclusion:
Here is what had
happened:
Production was well aware of how difficult it was to install that one screw, so most of the time it didn't get installed at all.
The chassis of the fairly heavy instrument flexed ever so little when shipping or moving the unit, so the pad on the circuit board mounting hole sometimes touched the mounting standoff (good) and sometimes it didn't (bad).
When the engineer was repeatedly installing multiple power-supplies in the "US" unit he dutifully put a screw in each of the four mounting holes, each time, which is why he couldn't duplicate the problem.
Cause of field failures:
A screw was missing.
Final recommendation for corrective action:
1. Make an access hole in the rear chassis.
2. Verify that all four screws are installed, and tight, in production.
3. Plug the access hole with a screw that matches the other screws on the back panel.
Back to Troubleshooting page.
Production was well aware of how difficult it was to install that one screw, so most of the time it didn't get installed at all.
The chassis of the fairly heavy instrument flexed ever so little when shipping or moving the unit, so the pad on the circuit board mounting hole sometimes touched the mounting standoff (good) and sometimes it didn't (bad).
When the engineer was repeatedly installing multiple power-supplies in the "US" unit he dutifully put a screw in each of the four mounting holes, each time, which is why he couldn't duplicate the problem.
Cause of field failures:
A screw was missing.
Final recommendation for corrective action:
1. Make an access hole in the rear chassis.
2. Verify that all four screws are installed, and tight, in production.
3. Plug the access hole with a screw that matches the other screws on the back panel.
Back to Troubleshooting page.