The Data-Integrity Contract: No Synthetic Data in Anything That Decides
Most AI systems do not fail because the model was wrong. They fail because the data behind the model was wrong in a way nobody noticed — a placeholder that survived into production, a cache that quietly went cold, a feed that stopped updating while the dashboard kept rendering yesterday's number as though it were live. The model did exactly what it was asked. It was asked over bad inputs. This piece argues for a specific, enforceable discipline — a data-integrity contract — as the foundation of production AI reliability, and walks through how stale-as-live failures actually happen and how to engineer them out. The discipline here was forged operating a production, AI-orchestrated system at the hard end of reliability, where a confidently wrong number is more expensive than no number at all, and then generalized for client delivery.
Why data integrity is the real reliability problem
When teams talk about AI reliability they usually mean the model: hallucination, accuracy, drift, evaluation. That is a real surface, but it is the second-order one. The first-order surface is the data the model and the surrounding system consume. A correct model reasoning over a stale, mocked, or synthetic input produces a wrong answer with full confidence and no error flag — which is the worst possible failure, because it is indistinguishable from a right one until someone acts on it.
This is harder than ordinary data quality because of where AI sits in the stack. A traditional report that draws on stale data looks stale — the timestamp is on the page, a human reads it, judgement intervenes. An AI system collapses that distance. It ingests the input, reasons over it, and emits a decision or a number in a single motion, often with no human between the bad input and the consequence. The model also launders the staleness: it wraps a six-hour-old value in fluent, present-tense, authoritative prose. The very competence that makes the system useful is what makes a bad input dangerous.
So the contract is not a data-hygiene nicety bolted on at the end. It is a property the system must hold before any decision logic is allowed to run. Stated plainly: no synthetic, mocked, or silently-stale data in anything that informs a decision or reaches a user. Every data source carries a freshness guarantee, and when a source cannot meet it, the system says so explicitly instead of serving a confident wrong answer.
How stale-as-live failures actually happen
The failure mode is rarely a dramatic outage. It is a value that is structurally present but semantically dead. Consider a generic, non-proprietary example: a service writes the current value of some external signal to a cache every few seconds, and a downstream component reads that cache to make a call. One day the writer dies — a network blip, an expired credential, an upstream rate-limit, a deploy that didn't restart the right process. The cache still holds the last value it ever received. The reader is checking only whether a value exists and is greater than zero. It is. So the reader proceeds, treating a value frozen at the moment the writer died as the live state of the world. Nothing alarms, because every component is technically "up." The dashboard is green. The number is wrong, and it will stay wrong, confidently, until someone independently notices the world has moved and the value hasn't.
There are three recurring shapes here, and they compound:
- The mock that outlived the demo. A placeholder or fixture wired in during development to get the pipeline flowing, never removed, now silently feeding a production decision. It survives because it works — it returns plausible data on every call, so nothing ever fails loudly enough to get it noticed.
- The cache with no expiry semantics. A value cached for performance, read with an existence check (
value != null,value > 0) rather than a freshness check. Existence is not liveness. A value can exist and be hours stale; the read path cannot tell the difference unless freshness is part of the contract. - The liveness/outcome confusion. "The service is up" answers a different question from "fresh, correct data was just produced." A health check that pings a process tells you the process is running, not that it is doing its job. You can have a perfectly live service emitting perfectly dead data.
In each case the root cause is the same: the system was built to check presence, when the property that matters is recency and provenance. Presence is cheap to verify and reassuring; recency is the thing that was actually load-bearing.
The principle: freshness guarantees and fail-closed by default
The fix has two parts, and both are engineering commitments rather than aspirations.
First, every market-facing or decision-facing value carries explicit provenance and a freshness budget. A read is not "is there a value?" but "is there a value, where did it come from, and was it produced within the window this decision tolerates?" That window is a deliberate, documented choice per data source — sub-second for a fast-moving signal, minutes or hours for something that genuinely changes slowly. The point is that the tolerance is stated and checked, not assumed. A value that exists but falls outside its freshness budget is treated as no value at all.
Second, fail closed. When a source cannot meet its guarantee, the system returns an honest, explicit "data unavailable" — and the decision logic declines to act on absent data — rather than substituting a stale value, a default, or a synthetic stand-in. This inverts the usual instinct. Most systems are built to keep producing output under degradation, because an empty screen feels like a failure. But in anything that decides, a confident wrong answer is strictly worse than a visible gap. "I don't currently have a reliable value for this" is a safe state. A six-hour-old number presented as live is not.
This is also where the model-orchestration framing matters. We are AI-led and model-flexible — the conductor, not the soloist — and the contract is what lets an orchestrated, multi-component AI system be trusted with a number at all. The model is only as honest as the data plane beneath it. (Where we go deepest is Claude, while staying fluent across the stack; the contract is model-neutral by design, because the failure mode lives in the data path, not the model.)
The trade-offs, stated honestly
This discipline has real costs, and pretending otherwise would be the kind of overclaim this register avoids.
Fail-closed means the system will sometimes refuse to answer when a looser design would have guessed — and a guess is right more often than not, which makes the discipline feel pessimistic. The defence is that the cases where the guess is wrong are exactly the cases that matter, and you cannot tell them apart at the moment of the guess. You are trading a small amount of availability for a large amount of trustworthiness, and in any decision system that is the correct trade.
Freshness budgets also add engineering surface: every source needs a defined tolerance, every write needs to stamp provenance and time, every read needs to enforce the check. This is more code and more discipline than an existence check. It is not free. But it is the cheapest place to pay, because the alternative — discovering a stale-as-live failure after a decision has been made on it — is paid in incidents and in eroded confidence, which is far more expensive and arrives without warning.
How to apply it
You do not need to rebuild a system to adopt the contract. Start where decisions are made and work outward.
- Inventory the decision-facing data paths. For each value that informs a decision or reaches a user, ask one question: if its upstream source died right now, how would this code know? If the answer is "it wouldn't," you have found a stale-as-live risk.
- Replace existence checks with freshness checks. Anywhere a read tests only for presence, add provenance and a timestamp, and define the tolerance window explicitly. Make the window a reviewed decision, not an accident of caching.
- Audit for surviving mocks and fixtures. Search the production path for placeholders, sample data, and development stand-ins. Anything that returns plausible data without touching a real source is a candidate to fail loudly or be removed.
- Instrument the outcome, not the ping. Alarm on "fresh, correct data was produced," not on "the process is running." A green health check that coexists with dead output is the signal you most need to catch.
- Make "data unavailable" a first-class state. Build the explicit, honest unavailable path deliberately, and verify it triggers under real source failure — independently, against the running system, because self-grading tends to confirm the story you already told yourself.
The contract is, in the end, a single commitment: the system never lies about what it knows. It would rather tell you it cannot answer than answer wrongly with confidence. That is what makes AI in production something you can put a decision behind.
If you are putting AI behind real decisions and want a clear-eyed read on where stale-as-live and synthetic-data risks hide in your own stack, we can work through it directly in an AI working session, or as part of longer-term AI consulting. We don't just say it, we show it.