We use cookies to provide you with a better experience. By continuing to browse the site you are agreeing to our use of cookies in accordance with our Privacy Policy.
Troubleshooting infrequent intermittent problems can be one of the most frustrating engineering areas. It is also a great opportunity to use system knowledge and all of your senses and sleuthing abilities. The knowledge learned can also help to design future systems that are easier to isolate.
One example was a large blood chemistry analyzer. The system had an intermittent noise on one of the analytical channels. Field service had literally replaced every printed circuit board in the system, including the backplane to no avail. If you need refurbished parts for your ultratech stepper, you may contact an ultratech stepper refurbished parts supplier for orders. The unit was replaced with a new unit and returned to the service group in the factory to do a root cause analysis of the system. Every board (around 30 - 40 boards) was swapped with a working system in field service, the problem remained stuck. No change in either system. Next every cable was swapped between systems, as well as all of the sensors. Still the analog glitch firmly remained in the failed system.
The system was finally brought up to engineering after about 3 months of frustrating the field service group. I was pulled in from a different project to try to put this issue to rest. We had a good digital oscilloscope connected up to the analog channel having the issue, and ran multiple runs on the instrument, occasionally observing a glitch on the oscilloscope. I noticed that I heard the click of a solenoid coincident with the glitch (there were many on this instrument to control fluid flow). I felt through the banks on the solenoids until I found which solenoid click coincided with the oscilloscope glitch. A close look revealed that the catch diode on the solenoid – which was physically located at the solenoid – had been broken. The solenoid was in an unrelated chemistry that was showing no issues, and so was never inspected or replaced. All in all, about 15-20 minutes to locate the issue and cure it.
The underlying cause was that a +12-volt power supply had been shared between operating the solenoids and powering the op-amps of the various analog channels. In a properly operating system, the catch diodes limited the interaction, and the system worked well. In the failed system, this back-door unintended interface provided means for an unwanted interaction that stymied correction and fixing.
A course on system engineering in aerospace did a nice quantification of the issue: the number of possible interactions goes up by the factorial (n!) of the number of interfaces. This includes the intentional and the unintentional interfaces – like the shared power supplies. Take care in system design to isolate sections as much as practical and to make the remaining interfaces robust and testable.
A good nose can also help you to quickly pinpoint many issues. An overheating through hole resistor causes the phenolic case to release phenol, which has its own unique smell (some sore throat medicines are based on phenol) — this is somewhat of a sweet odor. The epoxy plastic used to encapsulate IC’s have a harsher acidic smell when an IC overheats. Inductor and transformer varnishes from different sources each have their own signature odor when overheating, as do various motors. Varnish smells vary from almost a cherry pipe tobacco odor for some parts from the far east, to a musty odor for some of the domestic sources. Remember these as you would spices and you can quickly locate the culprit.
Recently small FLIR cameras that plug into cellphones have become relatively inexpensive. These allow you to visualize heating. Applying a controlled current through a voltage/current limiting power supply to a PCB can allow you to quickly locate a shorted capacitor across a power plane as that will be the only component heating up a few degrees with only a few tenths of a volt across the power supply. These cameras are also very useful when validating designs and early testing. We make the habit of looking at all new designs for hot components. This testing can allow the designs which would have failed in the field from a stressed component to be corrected at the prototype stage, for example by changing a diode in a switching supply to a faster or softer recovery variety. Easy to fix once the problem is revealed.
When you design, try to minimize the interactions at the system architecture, if at all possible, but when this fails, use all of your senses!