Mars Climate Orbiter: $327 million lost over pounds vs newtons
September 23, 1999: NASA lost a spacecraft on approach to Mars. The probe was supposed to enter orbit 226 km above the surface; it actually passed at 57 km and burned up in the atmosphere. No hardware failure, no communication issue. It was a benchmark bug about integration testing.
What happened
Lockheed Martin (the spacecraft manufacturer) delivered software computing engine impulse in pound-seconds (lbf·s) — imperial units. JPL (mission control) expected data in newton-seconds (N·s) — metric. The difference: 1 lbf·s = 4.45 N·s.
Over 9 months of flight, every correction maneuver was computed with a ~4.5× error factor. The accumulated trajectory deviation was about 170 km — enough to end the mission in disaster.
What QA missed
⚠️ No contract test between the two teams. Each had its own unit tests, both systems passed. But nobody verified that the data at the seam used the same units.
⚠️ The spec existed. The ICD (Interface Control Document) explicitly required N·s. Lockheed ignored it — internally they used imperial out of habit. Documentation without automated verification is just text.
⚠️ End-to-end checks didn’t find the bug. Ground tests showed a small trajectory deviation, but it was interpreted as “normal model error”. Nobody asked “why is the deviation exactly this size?”. If someone had computed the 4.45 factor — they’d have recognized a familiar conversion multiplier.
⚠️ No fake “canary” runs — where data is explicitly checked for range sanity. If an engine fires for 5 seconds and outputs “X”, X should be in a known window. Any value 4× higher — flag.
What to take back to your project
✅ Units in types, not in comments. Not float impulse, but NewtonSeconds impulse (or at least impulse_ns in the name). The compiler and the reviewer will see mismatches immediately.
✅ Contract tests at every cross-system interface. Pact, OpenAPI validation, schema tests in CI — anything that prevents one service from changing format without breaking the other.
✅ Sanity checks on numeric values in production. If a value “should be in range X..Y” — assert in code or alarm in monitoring. When something runs at 4× normal — you’ll find out immediately, not 9 months later.
✅ Regular “sanity day”: the team walks through interfaces and asks “what if this value comes in unexpected units? what if null? what if negative?”. An hour every sprint.
More: NASA Lessons Learned — MCO Mishap Investigation Board Report.