HIL Automation & Manufacturing Test
From a Bench of One to a Line of Thousands
Back in Module 1 you brought up one finger-driver board the careful way: inspect, short-check, current-limited power, rails, brain, buses, functional test. It took an afternoon, a multimeter, and your full attention. Now the robot-hand product has shipped, the assembly house is building a thousand finger-driver boards a week, and nobody is going to hand-probe a thousand boards. Worse, the failure that bit you on board number seven (a hairline solder open under a connector) will bite again on a handful of those thousand, and you will never see it on the bench because you will never touch most of them.
The bench taught you how to trust one board. Production is the art of trusting every board without a human in the loop.
The whole module has been about proving a single assembly innocent. This lesson scales that proof. The throughline is one ordering idea: a board earns trust by passing a fixed sequence of automated tests, each one cheaper and earlier than the failure it catches. You build that sequence backward from the field, you run it on a machine instead of a person, and you keep the numbers so drift shows up long before a customer does.
By the end, you can
- Explain hardware-in-the-loop automation and how a plant simulation closes the loop around a board under test
- Order a manufacturing test flow (ICT, boundary scan, functional test, end-of-line) by what each stage catches and how early
- Choose between ICT, flying-probe, and boundary scan for a given board and volume
- Calculate fault coverage and explain why pass/fail limits trended across the fleet catch drift early
Intuition first
Picture the difference between a chef tasting one dish and a factory canning soup. The chef tastes, adjusts, tastes again, all judgment and attention. The cannery cannot taste every can, so it builds the judgment into the line: a fixed checkpoint for fill weight, one for seal integrity, one for temperature, each placed where catching a bad can is cheapest. A can that fails the seal check is pulled before it ever gets a label, because a labeled bad can costs far more than an unlabeled one, and a shipped bad can costs more than either.
Manufacturing test is that cannery for circuit boards. Every test you ran by hand on the bench becomes a fixed station the board moves through, and the order is chosen so the cheapest, earliest test catches the most defects. A solder bridge should die at the first electrical probe, not three stations later when you are running the finger motor at full torque. The art is not any single clever test. It is the sequence, and a machine that runs it the same way on board one and board one thousand.
The other half is the loop. On the bench, you were the feedback path: you watched the current, judged the rail, decided to proceed. Hardware-in-the-loop (HIL) automation replaces you with a model. You wrap the board in a simulated world (the plant), feed it the sensor signals it expects, watch the actuator commands it sends back, and let software decide pass or fail. The board thinks it is driving a real robot finger. It is really driving a math model, on a bench, at three in the morning, ten thousand times in a row.
Hardware-in-the-loop: closing the loop with a model
A finger-driver board does not do anything interesting in isolation. It reads an encoder, runs a control loop, and commands a motor. To test that it controls well, you need something to control. The expensive option is a real robot finger on every test station. The HIL option is a plant simulation: a mathematical model of the finger (its inertia, its joint friction, the encoder it carries) running in real time on a dedicated processor.
The board under test never knows the difference. The HIL rig electrically emulates the sensors the board expects, so the encoder signal the board reads is generated by the model, not a real encoder. The board runs its control algorithm and outputs motor commands. Those commands feed back into the model, which updates the simulated finger’s position and velocity, which changes the next encoder value the board reads. The loop is closed: sensor emulation in, actuator command out, model update, repeat. That is the entire idea of hardware-in-the-loop. The real hardware (your board and its firmware) sits inside a loop whose other half is simulated.
Why bother instead of using a real finger? Three reasons that come straight from the factory math.
- Safety and reach. You can command the model into a stall, a runaway, a snapped tendon, an over-temperature, conditions you would never deliberately inflict on real hardware, and watch how the firmware responds. Fault injection is free in simulation.
- Repeatability. The model does the exact same thing every run. A real finger has wear, slop, and temperature drift, so a real-finger test is never quite reproducible. A HIL test is, which is what lets you compare board to board and run to run.
- Speed and cost. A real plant is often slower and more expensive than the simulator that emulates it. You can run more tests, earlier, in parallel, without building a finger for every station.
Once the loop is automated, the rest is bench plumbing you already half-built in earlier modules: a motorized rig that cycles the finger through its travel, signal injection that drives the board’s inputs to known values, and a Python harness that logs every reading, plots it, and flags anything outside a pass/fail band. The harness is the same idea as your Module 1 UART banner, just grown up: instead of you reading a log line, software parses the stream, applies limits, and writes a verdict and a record.
Design for test: making a board testable on purpose
Here is the trap. A board that is a joy to bring up by hand can be nearly impossible to test on a line, because the things that matter to a human prober (room to land a clip, a visible LED, a UART banner) are not the things a machine needs. Design for test (DFT) is the discipline of building testability into the board before it is laid out, so the production line can actually reach every net.
The classic DFT ask is a test point on every net you want to probe: a small exposed pad the line’s fixture can land a spring pin on. Miss them at layout time and no amount of clever test software recovers the access. As the in-circuit-test world puts it bluntly: a test is only as good as the design of the PCB. If the designer left no access, the test cannot be performed. DFT is why test engineers want a seat at the schematic review, not a phone call after the boards are built.
The payoff of DFT is coverage, the fraction of possible defects your test flow can actually detect. A board with test points on every net and a JTAG chain through every digital device can be tested to very high coverage. A densely packed board with no test access and parts under a shield can leave whole regions invisible, and an invisible defect is one that ships.
The manufacturing test flow
Stack the tests in the order the cannery taught you: cheapest and earliest first, each station catching what the ones before it could not. A typical flow for the finger-driver board:
- AOI (automated optical inspection). Cameras compare every joint and part against a known-good reference: present, right part, right way round, soldered well. Catches the population defects (missing, rotated, tombstoned, bridged) before any power. You met AOI in Module 1; on the line it is a machine, not a loupe.
- ICT (in-circuit test). A bed-of-nails fixture lands hundreds of spring pins on the board’s test points at once and probes every net: shorts, opens, resistance, capacitance, part placement, polarity, then powers up and checks regulators. This is the workhorse that catches the manufacturing defects (bad solder joints, wrong values, flipped parts) fast and at the component level.
- Boundary scan (JTAG, IEEE 1149.1). For the dense digital regions a physical pin cannot reach (fine-pitch BGAs, traces buried under parts), the chips test themselves. The line shifts test patterns through scan cells built into each pin and reads them back, verifying solder joints and net connections with no probe touching them.
- FCT (functional test). Power the board and exercise the real function: run the firmware, close the HIL loop, drive the simulated finger, confirm the control loop tracks. ICT proves the board was built right; FCT proves it works right.
- End-of-line test. The final gate before the board (or the finished hand) ships: the full as-shipped behavior, calibration, serial-number programming, and the record that says this exact unit passed.
The ordering rule is the same one from the bench: never run the expensive late test on a board the cheap early test would have failed. Running FCT (motor spinning, loop closed) on a board with a solder short that ICT would have caught in a second wastes a station and risks the test rig. Short-check before functional, then as now.
ICT, flying-probe, and boundary scan: which probe, when
Three ways to reach a net, three different trade-offs.
ICT with a bed of nails presses the board onto a fixture of spring-loaded pogo pins so many tests run at once. It is fast (the whole board in seconds) and diagnoses faults right down to the component, which is why it dominates medium-to-high volume. The costs are a custom fixture per board design (expensive, slow to build) and real mechanical strain on the board from being pressed onto a bed of pins.
Flying probe needs no fixture at all. A few probes on motorized arms move from net to net, touching each in turn. That makes it ideal for prototypes, low volume, and boards that change often, because there is nothing to build and the same machine handles any design. The price is speed: moving probes test one or a few nets at a time, so a flying-probe pass takes far longer than a bed-of-nails pass. The rule of thumb: flying probe for the first hundred boards, ICT once you are building thousands.
Boundary scan reaches where neither physical probe can. Modern boards are too dense to land a pin on every net, and BGAs hide their joints under the package entirely. Boundary scan sidesteps the physical-access problem: the chips carry built-in scan cells on every pin, the test shifts patterns through them, and a broken solder joint shows up as a pattern that does not arrive. No probe, no test point, no access problem. The cost is that the chips must support it (they need the JTAG logic designed in) and you only see what is on a scan chain, so it complements ICT rather than replacing it. In practice a real line combines them: ICT for the analog and the accessible nets, boundary scan for the dense digital interconnect, FCT to prove the whole thing actually runs.
cheapest / earliest most expensive / latest
──────────────────────────────────────────────────────────────────────────▶
AOI ──▶ ICT (bed-of-nails) ──▶ boundary scan ──▶ FCT (HIL loop) ──▶ end-of-line
optical probe every net JTAG, no probe exercise the ship gate +
defects shorts/opens/values dense digital real function traceability
│
(flying probe substitutes here for low volume / prototypes)
A finger-driver board has a fine-pitch BGA microcontroller whose solder balls sit entirely under the package, plus a dozen exposed analog test points. The line builds thousands per week. What test approach reaches the most defects?
-
ICT lands physical pins on test points, but a BGA's solder balls are under the package with no pad to touch. ICT cannot reach those joints, so it leaves the densest, most failure-prone region uncovered.
-
Flying probe also needs a physical landing point and would be far too slow at thousands of boards per week; it suits prototypes and low volume, not this line.
-
Correct. ICT handles the accessible analog nets fast and at the component level, and boundary scan shifts patterns through the BGA's built-in scan cells to verify the joints a probe can never touch. The two are complementary, which is exactly how dense boards are tested.
-
Functional test proves the board runs but is a poor fault locator: a board can fail FCT for many reasons and FCT will not tell you which joint is open. It is the last gate, not the coverage workhorse, and it cannot localize like ICT plus boundary scan.
Why does a production line trend each board's pass/fail measurements across the whole build and fleet, instead of just recording pass or fail?
-
Logging and trending add work, not speed; the test takes the same time whether you keep one bit or the full reading.
-
Correct. A bare pass/fail throws away the margin. Trending the actual numbers shows a population creeping toward a limit (a supplier change, a drifting fixture, a process shift) while every board still passes, so you act before the first real failure instead of after a field return.
-
Traceability requirements vary and are not the engineering reason; you would trend the data even if no regulation asked you to, because the early-warning signal is worth it.
-
Trending does not replace the golden unit; the golden unit checks the tester, while trending watches the boards. They answer different questions and you want both.
Lab: turn your bench bring-up into a one-button test
Take the manual finger-driver bring-up you ran in Module 1 and script it. Write a small Python harness that talks to the board over UART, issues each step (read rails, start the oscillator check, close a HIL-style loop against a simple finger model, command a known move, read back the encoder), and applies a numeric pass/fail limit to each reading. Print one verdict line per board and append every measured number to a CSV. Run it on five boards. Then plot the CSV: even with five points you will see the spread. The day a sixth board’s rail voltage sits at the edge of the band while still passing, your trend caught a drift your eyes never would. You have just rebuilt, in miniature, the loop that closes from the single-board bench of Module 1 to a production line of thousands.
What boundary scan actually shifts through a chip, and why it can test a joint no probe can reach
Boundary scan, standardized as IEEE 1149.1 by the Joint Test Action Group (so “JTAG” and “boundary scan” are now nearly synonymous), works by adding a small piece of logic to every signal pin of a compliant chip: a scan cell that can either pass the pin’s normal signal through or override it. All those cells are chained into one long shift register that loops around the device’s boundary, hence the name, and is driven by a four-pin Test Access Port (TAP): TDI (data in), TDO (data out), TCK (clock), TMS (mode select), with an optional TRST (reset).
The test is a clocked shift, exactly like an SPI transfer through a daisy chain. To verify a trace between two chips, the line loads a test cell on the source chip to drive a known value onto its pin and across the board trace, then reads the cell on the destination chip to see whether that value arrived. If the trace is open, the value never shows up; if it is shorted to a neighbor, the wrong value shows up. That is a solder-joint test performed entirely by shifting bits through registers, with no physical probe and no test point. For a 361-ball BGA whose joints are sealed under the package, this is the only way to verify the interconnect.
The cost of admission is that the chip must carry the boundary-scan logic, described in a manufacturer-supplied BSDL (Boundary Scan Description Language) file that lists every pin’s scan cell. The standard defines mandatory instructions the TAP must support, including BYPASS (skip this chip), SAMPLE/PRELOAD (capture or load pin values), and EXTEST (drive the pins to test the board between chips). Commercial board testers import the design netlist plus the BSDL files and generate the test vectors automatically, then deliver them in interchange formats like SVF. The same chain that tests joints in the factory doubles as a debug port and an in-system programmer: the line often uses JTAG to load firmware into flash right after it has used JTAG to prove the flash chip is soldered correctly.
The history is worth a beat. In the 1980s, multi-layer boards and BGAs were burying connections where no probe could reach, and most field faults were bad solder joints exactly in those hidden spots. JTAG formed in 1985 specifically to give a “pins-out view from one IC pad to another” so those faults could be found. Intel’s 486 shipped with JTAG in 1990, the year it became IEEE 1149.1, and adoption followed fast. Earlier serial-test ideas (James B. Angell’s serial testing at Stanford, IBM’s level-sensitive scan design) pointed the way; JTAG made it an industry standard. Today essentially every embedded platform above the smallest microcontrollers carries a JTAG port, which is why your ESP32 debugger, your boundary-scan test, and your factory flash programmer all speak the same four-wire protocol.
Grounded in Wikipedia: “Hardware-in-the-loop simulation”, “In-circuit testing”, “Boundary scan”, “JTAG” (CC BY-SA).
Key takeaways
- Manufacturing test scales bench bring-up into a fixed automated sequence: cheapest, earliest test first, each station catching what the ones before could not.
- Hardware-in-the-loop wraps the board in a real-time plant simulation: emulated sensors in, actuator commands out, so the board is tested in a closed loop without the real (dangerous, slow, costly) plant.
- The flow is AOI → ICT → boundary scan → FCT → end-of-line: optical, then probe every net, then JTAG the dense digital, then exercise the real function, then ship-gate with traceability.
- ICT (bed-of-nails) is fast and component-level for volume; flying probe needs no fixture but is slow, for prototypes; boundary scan reaches joints no probe can.
- Design for test earns coverage: no test access in layout means no test on the line. A golden unit tests the tester; trended limits across the fleet catch drift before failures.
You inherit a finger-driver test flow that runs, in this order: functional test (motor spinning, HIL loop closed), then in-circuit test, then AOI. Reorder it correctly and say why the new first and last stages belong where they do.
Show worked solution
Correct order: AOI → ICT → functional test. AOI is first because it is unpowered, fast, and catches population defects (missing, rotated, bridged parts) before any electrical test, the same “inspect first” rule as the bench. Functional test is last because it is the most expensive and the worst fault locator: it spins the motor and closes the HIL loop, so running it on a board that ICT would have failed for a solder short wastes the station and can stress the rig. Each stage should catch what the cheaper, earlier stage could not, so you never pay for a late test that an early test made unnecessary.
A test flow checks 240 distinct potential fault sites on the board. ICT plus boundary scan together detect 222 of them; the rest are nets with no test access and parts hidden under a shield. What is the test coverage as a percentage, and what is the practical risk of the gap?
Show worked solution
Coverage is the fraction of fault sites the flow can detect:
So of the possible defects are invisible to the flow. The practical risk: any defect at one of those 18 sites passes the line undetected and ships. If a hidden net under the shield has a marginal solder joint, the board leaves the factory “passing” and becomes a field failure. The fix is a design-for-test change (add test access to those nets, or put the hidden parts on a boundary-scan chain) so coverage climbs toward 100% before, not after, the boards are built. An invisible defect is one that ships.
You want to test the finger-driver firmware’s response to an encoder that suddenly reports the finger jammed at full torque, a condition you must never inflict on a real finger. Sketch how a hardware-in-the-loop rig lets you test this safely, naming the signal that flows each way across the loop.
Show worked solution
Put the board in a HIL loop instead of on a real finger. The rig runs a real-time plant simulation of the finger (inertia, joint friction, the encoder). The flow around the loop:
- The model generates the encoder signal and the HIL rig electrically emulates the sensor, so the board reads a simulated encoder value (sensor signal into the board).
- The board runs its control loop and outputs a motor-drive command (actuator signal out of the board) back into the model.
- The model integrates that command to update the simulated finger’s position and velocity, producing the next encoder value, and the loop repeats.
To test the jam, you simply tell the model to inject the fault: hold the simulated position fixed while the board keeps commanding torque, exactly as a real jam would look to the encoder. Now you watch the firmware’s reaction (does it current-limit, fault out, back off?) with zero risk to real hardware, perfectly repeatably, as many times as you like. The fault injection is free because the dangerous part of the world is math, not metal. That is the whole reason HIL exists.
The bench taught you patience with one board: look before you leap, leash what you cannot see, let the assembly prove its innocence one verified layer at a time. The line is that same patience, cast in steel and software, repeated a thousand times an hour without you. You do not lose the craft when you scale. You encode it: every careful step you once took by hand becomes a station the board must pass, every judgment a limit the data must clear, every intuition a trend that warns you before a customer ever could. One board, proven by a person. Every board, proven by the process that person designed.