← all lessons
Differential Buses · CAN & Ethernet · #32 of 48

Ethernet Debug & Signal Integrity

TDR, Eye Diagrams and the PHY Registers

The robot hand’s wrist controller links up, the PHY registers say 100/full on both ends, and for an hour the telemetry is perfect. Then the arm starts moving for real. As the elbow motor spins up, packets begin to vanish, the error counters tick, and once every few minutes the link drops and re-negotiates with a little stutter you can feel in the control loop. Nothing in your firmware changed. The cable did not move. The only new variable is a cable run that now flexes past a screaming inductor twelve times a minute, and somewhere along that run a connector has worked a hair loose.

The bits are not wrong because the logic is wrong. The bits are wrong because the analog signal carrying them no longer survives the trip.

The last lesson taught you to stop trusting the link light and read the PHY for the negotiated truth. This one goes one floor deeper, into the analog world where the actual waveform lives or dies. When Ethernet misbehaves you do not guess at random: you debug in a fixed order, from the cheapest read to the most expensive, from indicators, to registers, to the cable, to the waveform itself, and at each step the question is the same, “is the signal still good enough to recover?”

By the end, you can

  1. Order an Ethernet debug pass from LEDs → PHY registers → cable → waveform and say what each step rules in or out
  2. Explain how a time-domain reflectometer locates a fault and its distance from the reflected pulse, and read open vs short vs discontinuity from the sign of the reflection
  3. Calculate the distance to a cable fault from the round-trip time and the cable's velocity of propagation
  4. Read an eye diagram: explain how overlaying many bit periods reveals jitter, ISI and attenuation, and what a closing eye means for your margin
  5. Choose the right instrument for a given Ethernet symptom and justify the choice

Intuition first

You already know how to find a leak in a garden hose two ways. If the hose is short and on the lawn, you walk it and look. But if it runs underground for fifty meters, walking it is hopeless, so you do something cleverer: you give the water a sharp thump at your end and feel how long it takes the pressure wave to bounce back off the blockage. A blockage near you bounces back fast; one far away takes longer. Time the echo, know how fast pressure travels in the hose, and you have the distance to the fault without digging up the whole yard.

That thump-and-time trick is a time-domain reflectometer, or TDR. You send a fast electrical pulse down the cable and listen for its reflection. A perfectly smooth, properly terminated cable sends nothing back, the pulse just gets absorbed at the far end. But any discontinuity (an open where a wire broke, a short where two pins touched, a crushed spot where the impedance jumped) reflects part of the pulse straight back to you. The size of the echo tells you how bad the fault is, and the delay tells you how far away it is. A TDR is radar for a wire.

The second instrument answers a different question. Suppose the cable is intact but the signal arriving at the far end is just barely readable, smeared and shrunk by the trip. How do you measure “barely”? You take thousands of received bits, slice the waveform into one-bit-wide slivers, and stack every sliver on top of every other one in the same window. The pile-up draws a shape that, for a healthy signal, looks like an open eye: a clear, bright opening in the middle where the receiver can confidently tell a 1 from a 0. As the signal degrades, the eye closes. That stacked picture is an eye diagram, and the size of the opening is your margin made visible.

Debug in order: cheapest read first

The failure in the Hook is the kind that drives people to randomly swap parts for an afternoon. The cure is discipline: a fixed ladder where each rung is more expensive than the last, and you only climb when the cheaper rung has told you all it can.

The indicators are free and instant, so they go first. From the last lesson you know their honest meaning: the link LED says only that the PHY sees a carrier, and an activity LED says frames are moving. A dark link LED with a known-good cable points at the physical layer (cable, magnetics, connector). A green link LED that flickers off exactly when the motor spins up, as in the Hook, is already a strong clue: something analog is marginal and the motor’s noise pushes it over the edge.

2. Read the PHY registers over MDIO

Next, ask the PHY for its account, over the MDIO management bus you met last lesson. Three things matter here. First, the negotiated speed and duplex, to rule out the mismatch from lesson 31. Second, the link status bit, including the latched “link went down since I last looked” bit that catches the intermittent drops a steady poll would miss. Third, and most useful for signal integrity, the PHY’s error counters: running totals of frames that arrived with a bad checksum, symbols the decoder could not make sense of, and other recovery events. A counter that climbs only while the motor runs converts a vague “it feels flaky” into a measured, repeatable fault tied to a physical cause.

3. Test the cable with a TDR

If the registers say the negotiation is fine but errors climb under stress, suspect the physical channel: the cable, the connectors, the terminations. This is where the TDR earns its keep, because it does not just say “the cable is bad,” it says “the cable is bad 2.7 meters from this end, and it reads like a near-open.” We will work the distance math in the next section.

4. Look at the waveform with an eye diagram

The cable can be electrically continuous and still deliver a signal too degraded to decode reliably. The TDR finds breaks and bumps; the eye diagram measures quality. When the cable checks out but the BER is still high, you put a scope of sufficient bandwidth on the link, build the eye, and read how much margin is left. Also check the two passive parts that quietly wreck signal integrity when wrong: the magnetics (open or saturated by a DC offset) and the terminations (a missing or wrong-value end resistor reflects the signal back on itself, exactly the discontinuity the TDR is built to catch).

On a robot link, the PHY registers report a clean, stable 100/full negotiation, but the CRC-error counter climbs steadily whenever the arm moves. What is the most sensible NEXT step in the debug ladder?

The TDR: distance from a reflected pulse

The reason a TDR can give you a distance and not just a yes/no is pure physics, and it is worth doing by hand once so the number stops feeling like magic.

A signal travels down a cable at a fixed fraction of the speed of light. That fraction is the cable’s velocity of propagation, often written as a velocity factor VF\text{VF} between 0 and 1; for common twisted pair and coax it is roughly 0.6 to 0.7. So the signal speed is

v=VFcv = \text{VF} \cdot c

where cc is the speed of light in vacuum, about 3.0×108 m/s3.0 \times 10^{8}\ \text{m/s}. A pulse you launch has to travel down to the fault and the reflection has to travel back, so the time you measure on the screen, tt, covers twice the distance to the fault. The distance is therefore

d=vt2d = \frac{v \cdot t}{2}

Work a real one. Your TDR shows a reflection at t=27 nst = 27\ \text{ns} on a cable with VF=0.66\text{VF} = 0.66. Then

v=0.663.0×108=1.98×108 m/sv = 0.66 \cdot 3.0 \times 10^{8} = 1.98 \times 10^{8}\ \text{m/s} d=1.98×10827×10922.67 md = \frac{1.98 \times 10^{8} \cdot 27 \times 10^{-9}}{2} \approx 2.67\ \text{m}

So the fault is a touch under 2.7 meters from where you connected. On a robot, that is often enough to point straight at a specific connector or a known flex point in the harness.

The sign and size of the reflection then tell you what kind of fault it is. The reflection coefficient ρ\rho compares the impedance the pulse runs into, ZtZ_t, with the cable’s own characteristic impedance, Z0Z_0:

ρ=ZtZ0Zt+Z0\rho = \frac{Z_t - Z_0}{Z_t + Z_0}

Read off the three cases you care about:

So one TDR trace gives you three facts at once: that there is a fault, how far it is, and what kind it is. That is why it is the instrument you reach for the moment the registers are clean but the channel is suspect.

Portrait of Oliver Heaviside
Oliver Heaviside · 1850-1925 Recast Maxwell's equations into the form engineers use and worked out the telegrapher's equations that govern how a pulse travels and reflects on a transmission line, the physics every TDR quietly runs on.

The eye diagram: margin you can see

A TDR finds breaks. The eye diagram answers a softer, scarier question: the cable is intact, so why are bits still arriving wrong? The answer is that staying continuous is not the same as staying clean. A long or lossy run degrades the waveform until the receiver can no longer tell a 1 from a 0 with confidence, and the eye diagram is how you measure exactly how close to that edge you are.

You build it by slicing the received waveform into pieces one unit interval (one bit period) wide and overlaying thousands, even millions, of them in the same window. Think of it as a long-exposure photograph of every bit the link has sent. Where the signal reliably sits high, the photo is bright; where it reliably sits low, bright again; and in between, where transitions happen, the traces sweep through and leave the two bright rails with a clear opening between them. That opening is the eye.

A wide-open eye means the receiver has lots of room: at the moment it samples each bit (the middle of the eye), the voltage is unambiguously high or low, and the timing of the edges is crisp. Three independent enemies eat into that opening, and the eye shows you which one:

Push any of these far enough and the eye closes: the opening shrinks to nothing, and there is no instant and no voltage at which the receiver can reliably decide the bit. That is precisely the moment errors explode. “The eye is closing” is engineer shorthand for “jitter, ISI, and attenuation are eating my margin, and I am running out.”

To turn the picture into a number, you do two things. You overlay a mask, a forbidden keep-out zone in the center of the eye that the standard says no trace may enter; if any trace touches the mask, the link has failed its margin spec. And you measure the bit-error rate (BER), the fraction of bits that come out wrong over a long run (a healthy link sits at something like one error in 101210^{12} bits or better). The mask test plus the BER together quantify how much margin you actually have, where the eye opening was only the qualitative picture.

   wide-open eye (healthy)              closing eye (marginal)
   _______        _______               _____       _____
          \      /                           \  ___  /
           \    /        <- crossings          \/   \/   <- crossings smeared
            \  /            sharp               /\   /\      (jitter)
   . . . . . \/ . . . . . sample here          /  \ /  \
            /  \         <- big opening        /    X    \  <- tiny opening
           /    \           (margin)          /    / \    \    (no margin)
   _______/      \_______               _____/   _/   \_   \____
                                                  eye nearly shut

A receiver's eye diagram shows the edge crossings smeared into a wide band sideways, while the vertical opening stays roughly full height. Which channel impairment is this pointing at, and what is the consequence?

Recreate the Hook on the bench in order, and never skip a rung. Start with the LEDs: confirm the link is green and watch whether it flickers when you run the nearby motor. Next, read the PHY over MDIO (or ethtool on a Linux host, or the managed switch’s port page): note the negotiated speed and duplex, then read the CRC and symbol-error counters, run the motor for thirty seconds, and read them again, the delta is your fault made measurable. If the counters move under motor load with clean negotiation, bring out the TDR: launch into the suspect run, find the reflection, and compute the distance with d=vt/2d = vt/2 using the cable’s velocity factor, then read the reflection’s sign to call it open, short, or discontinuity. Finally, if the cable is electrically clean but errors persist, build an eye diagram on a fast scope: overlay many unit intervals, drop in the standard’s mask, and capture a BER. Check the magnetics and terminations while you are there, since a wrong end resistor shows up as both a TDR discontinuity and a closed eye. Write down, at each rung, exactly what it ruled in or out, so the next person does not start from zero.

The half-amplitude convention, why a stub closes the eye, and what 'enough bits' means

A TDR step does not just show open or short, it shows the journey. If the far end is a clean open, the launch point steps to the input voltage, and after the round-trip delay it jumps to twice the input as the same-polarity reflection adds in; a clean short steps up at launch and then collapses back toward zero when the inverted reflection returns. A discontinuity partway down the line is the interesting case: a loose connector that raises the local impedance shows a small positive bump partway along an otherwise flat trace, and the distance to that bump, by d=vt/2d = vt/2, is the distance to the connector. Engineers often quote the fault location at the point where the reflected step reaches half its final amplitude, because the finite rise time of a real pulse smears the edge and the half-amplitude point is the most repeatable place to read the time off the screen. The whole method’s resolution is set by that rise time: a faster edge resolves two close faults; a slow edge blurs them into one, which is why TDRs aimed at short PCB traces use steps with rise times of tens of picoseconds.

The eye diagram and the TDR are two views of the same reflection physics. A stub or mismatch that a TDR draws as a bump partway down the trace is the very same defect that an eye diagram draws as a step on the rising edge or, if the round-trip delay of the stub exceeds one unit interval, as the eye slamming shut. When a reflection arrives later than one bit period, it lands inside the next bit’s window, which is ISI in its purest form: the past literally interfering with the present. That is why a stub a little too long is so much worse than one a little too short, the eye stays readable until the stub’s round trip crosses one UI, then it collapses.

Two honest caveats. First, an eye is only as trustworthy as the number of bits behind it. A few thousand unit intervals sketch the rough shape, but rare, worst-case combinations of bits, the ones that actually cause errors, only show up after tens or hundreds of millions of intervals, which is why serious signal-integrity work accumulates enormous captures before trusting a margin number. Second, this lesson’s debug order (LEDs, then registers, then cable, then waveform) is a field heuristic for going cheapest-first, not a standards mandate; a hard, repeatable failure sometimes justifies jumping straight to the scope. The ordering earns its place by how often it saves you from buying answers you could have read for free.

Grounded in Wikipedia: “Time-domain reflectometer”, “Eye pattern”, “Ethernet physical layer” (CC BY-SA).

Key takeaways

  • Debug Ethernet cheapest read first: link LEDs → PHY registers over MDIO (speed/duplex, link status, error counters) → cable with a TDR → waveform with an eye diagram.
  • A TDR is radar for a wire: it sends a pulse and times the reflection to give the fault's distance as well as its existence, via $d = vt/2$.
  • The sign of the reflection names the fault: positive (up) is an open, negative (down) is a short, partial is an impedance discontinuity.
  • An eye diagram overlays many bit periods so margin becomes visible; jitter narrows it sideways, ISI and attenuation shrink it vertically.
  • A closing eye means jitter, ISI and attenuation are eating your margin; a mask test plus a bit-error rate turn that picture into a pass/fail number.
  • Always check the magnetics and terminations: a wrong end resistor shows up as both a TDR discontinuity and a closed eye.
Practice 1 warm-up

A TDR shows a reflection at t=40 nst = 40\ \text{ns} on a cable whose velocity factor is VF=0.66\text{VF} = 0.66. How far away is the fault? Use c=3.0×108 m/sc = 3.0 \times 10^{8}\ \text{m/s}.

Show worked solution

First the signal speed: v=0.663.0×108=1.98×108 m/sv = 0.66 \cdot 3.0 \times 10^{8} = 1.98 \times 10^{8}\ \text{m/s}. The measured time is the round trip, so the distance is

d=vt2=1.98×10840×10923.96 md = \frac{v \cdot t}{2} = \frac{1.98 \times 10^{8} \cdot 40 \times 10^{-9}}{2} \approx 3.96\ \text{m}

The fault is about 4 meters from where you connected the TDR. (Forgetting the factor of 2 would double the answer to about 8 meters and send you looking in the wrong place, so the round trip is the part not to skip.)

Practice 2 core

On the robot harness, one TDR trace steps up to nearly twice the launched amplitude at a delay matching a known connector, and a different cable’s trace dips down at its delay. Name the fault type in each case, and explain the physics from the reflection coefficient ρ=(ZtZ0)/(Zt+Z0)\rho = (Z_t - Z_0)/(Z_t + Z_0).

Show worked solution

The trace that steps up toward double amplitude is an open circuit (a broken conductor or an unmated connector). There ZtZ_t \to \infty, so ρ=(ZtZ0)/(Zt+Z0)+1\rho = (Z_t - Z_0)/(Z_t + Z_0) \to +1: the reflection returns with the same polarity as the launched pulse and adds to it, which is why the screen voltage climbs toward twice the input.

The trace that dips down is a short circuit (two conductors touching). There Zt=0Z_t = 0, so ρ=(0Z0)/(0+Z0)=1\rho = (0 - Z_0)/(0 + Z_0) = -1: the reflection returns inverted and subtracts, pulling the trace down toward zero. So the sign of the step reads straight off as open (up) versus short (down), and the delay of each, via d=vt/2d = vt/2, tells you which connector to go fix.

Practice 3 stretch

Your link’s negotiation is clean and the TDR finds no break, yet the BER is poor. On the eye diagram you see two things at once: the edge crossings are smeared into a wide horizontal band, and the top and bottom rails have rounded over so the eye is also short. Diagnose both impairments, say which physical causes you would chase, and explain what “the eye is closing” means for the receiver here.

Show worked solution

Two separate impairments are stacking. The horizontal smear of the crossings is jitter: the transition edges are not arriving at a single instant, so the timing margin (the eye’s width) is shrinking. The rounded, shortened rails are attenuation (and likely some ISI): the cable is acting as a low-pass filter, so fast transitions never fully reach their high and low levels within one bit period, collapsing the eye’s height.

Causes to chase: for the attenuation, suspect a run that is too long or too lossy, or degraded cable; for the jitter, suspect noise coupling (the nearby motor from the Hook), a marginal clock-recovery loop, or reflections from a slightly mismatched termination feeding ISI back into the edges. Check the magnetics and the end resistors first, since a wrong termination produces both a small TDR discontinuity and exactly this kind of eye degradation.

“The eye is closing” means the clear opening in the middle, the region where the receiver samples and decides 1 versus 0, is shrinking from both sides. As it shrinks there is less and less voltage and less and less time at which the decision is unambiguous, so any added noise or jitter tips a bit the wrong way. When the opening reaches zero the receiver can no longer reliably recover the data, and the BER, which a mask test would already be flagging, climbs toward catastrophic.

The loose connector twelve meters down the harness never announces itself. It waits for the motor to spin, leans on the analog margin you could not see, and lets a handful of bits die quietly in the noise. The instruments in this lesson are how you make the invisible loud: time an echo and the fault gives up its distance; stack a million bits and your margin draws its own portrait. Stop guessing at the wire. Send a pulse and listen, open the eye and look.

full glossary →