You do not need to rebuild a system to adopt the contract. Start where decisions are made and work outward. - Inventory the decision-facing data paths. For each value that informs a decision or reaches a user, ask one question: if its upstream source died right now, how would this code know? If the answer is "it wouldn't," you have found a stale-as-live risk. - Replace existence checks with freshness checks. Anywhere a read tests only for presence, add provenance and a timestamp, and define the tolerance window explicitly. Make the window a reviewed decision, not an accident of caching. - Audit for surviving mocks and fixtures. Search the production path for placeholders, sample data, and development stand-ins. Anything that returns plausible data without touching a real source is a candidate to fail loudly or be removed. - Instrument the outcome, not the ping. Alarm on "fresh, correct data was produced," not on "the process is running." A green health check that coexists with dead output is the signal you most need to catch. - Make "data unavailable" a first-class state, and verify it under real failure. Build the explicit, honest unavailable path deliberately — then prove it triggers under a real, induced source failure, against the running system, checked by someone other than the author. The fail-closed branch is the one path that only fires under failure, so it is the least exercised in normal operation and the most likely to be quietly broken. A fail-closed path you have not watched fail closed is an assumption, not a guarantee. This is what production AI, proven actually means in practice: not a claim that the system is safe, but independent evidence that the safe path fires when it has to. Every failure in the table above fails the same way: it verifies presence — a value exists, a process responds, a job finished — when the property that is load-bearing for a decision is recency-and-provenance: this specific value, fresh enough for this decision, from the source it claims. A system that checks whether a value exists, but not whether it is still true, is not "mostly reliable" — it is confidently wrong on exactly the inputs that matter, and it cannot tell you which ones those are. That is why the data-integrity contract is the gate to production: it is the discipline that turns "the number is there" into "the number is true, and the system would have said so if it weren't." --- If you are putting AI behind real decisions and want a clear-eyed read on where stale-as-live and synthetic-data risks hide in your own stack, we can work through it directly in an AI working session, or as part of longer-term AI consulting. The proof is the fail-closed path firing on demand, watched by someone other than the author — evidence, not assurance. NewGenApps — production AI, proven. Stay a step ahead, always.

The Data-Integrity Contract: No Synthetic Data in Anything That Decides

Q: What is a data-integrity contract?

In short: A data-integrity contract is an enforced engineering guarantee that no synthetic, mocked, or silently-stale data is allowed into anything that informs a decision or reaches a user. Every decision-facing value carries explicit provenance (where it came from) and a freshness budget (the maximum age this decision tolerates); a value that cannot meet its budget is treated as no value at all, and the system fails closed — returning an honest "data unavailable" rather than a confident wrong answer. It is worth being precise about two words, because they are what make the definition correct rather than generic. The first is "contract," not "check." A check is a line of code someone might add. A contract is a property the system must hold before any decision logic is allowed to run — stated, enforced, and verified, the same way a type signature or a service-level objective is enforced. It is a precondition, not a courtesy. The second is "freshness budget," not "fresh." Freshness is not a binary, and not a global setting. It is a per-source, documented tolerance window — sub-second for a fast-moving signal, hours for something that genuinely changes slowly. The discipline is that the tolerance is stated and enforced per source, not assumed. The useful mental model is that a data-integrity contract is a service-level objective for data, enforced at read time. Site reliability engineering already has the vocabulary: Google's Site Reliability Workbook defines a data-freshness SLO — "the oldest data is no older than Y minutes" — as a measurable, alertable target distinct from a data-correctness SLO (Google, Site Reliability Workbook: Data Processing Pipelines). The contract takes that pipeline-level discipline and moves it to the decision boundary: not "did the pipeline meet its freshness SLO somewhere upstream," but "is this specific value, at the moment I am about to decide on it, within its budget and provenanced — and if not, do I refuse to act?"

Most AI systems do not fail because the model was wrong. They fail because the data behind the model was wrong in a way nobody noticed — a placeholder that survived into production, a cache that quietly went cold, a feed that stopped updating while the dashboard kept rendering yesterday's number as though it were live. The model did exactly what it was asked. It was asked over bad inputs. This piece argues for a specific, enforceable discipline — a data-integrity contract — as the foundation of production AI reliability, and walks through how stale-as-live failures actually happen and how to engineer them out.

Last reviewed: June 2026.

What is a data-integrity contract?

In short: A data-integrity contract is an enforced engineering guarantee that no synthetic, mocked, or silently-stale data is allowed into anything that informs a decision or reaches a user. Every decision-facing value carries explicit provenance (where it came from) and a freshness budget (the maximum age this decision tolerates); a value that cannot meet its budget is treated as no value at all, and the system fails closed — returning an honest "data unavailable" rather than a confident wrong answer.

It is worth being precise about two words, because they are what make the definition correct rather than generic.

The first is "contract," not "check." A check is a line of code someone might add. A contract is a property the system must hold before any decision logic is allowed to run — stated, enforced, and verified, the same way a type signature or a service-level objective is enforced. It is a precondition, not a courtesy.

The second is "freshness budget," not "fresh." Freshness is not a binary, and not a global setting. It is a per-source, documented tolerance window — sub-second for a fast-moving signal, hours for something that genuinely changes slowly. The discipline is that the tolerance is stated and enforced per source, not assumed.

The useful mental model is that a data-integrity contract is a service-level objective for data, enforced at read time. Site reliability engineering already has the vocabulary: Google's Site Reliability Workbook defines a data-freshness SLO — "the oldest data is no older than Y minutes" — as a measurable, alertable target distinct from a data-correctness SLO (Google, Site Reliability Workbook: Data Processing Pipelines). The contract takes that pipeline-level discipline and moves it to the decision boundary: not "did the pipeline meet its freshness SLO somewhere upstream," but "is this specific value, at the moment I am about to decide on it, within its budget and provenanced — and if not, do I refuse to act?"

Why is data — not the model — the leading cause of AI pilot failure?

Because a correct model reasoning over a stale, mocked, or synthetic input produces a wrong answer with full confidence and no error flag. It is the worst possible failure, because it is indistinguishable from a right one until someone acts on it.

The survey evidence makes the scale concrete. In Informatica's 2025 survey of 600 chief data officers and chief data and analytics officers across the U.S., Europe, and Asia, data quality, completeness, and readiness was the most-cited obstacle to moving generative AI from pilot to production — named by 43% of data leaders, tied with technology adoption challenges — while two-thirds (67%) said they had not been able to move even half of their generative AI pilots into production (Informatica, CDO Insights 2025, January 2025). The number worth holding is that 67% gap: most pilots are not reaching production, and the most-cited reason is the state of the data, not the sophistication of the model.

That is not a one-survey artifact. The most rigorous academic backing is peer-reviewed: in a study of AI practitioners building high-stakes systems, 92% reported experiencing one or more "data cascades" — compounding, downstream failures that originate in data — and 45.3% reported two or more (Sambasivan et al., "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI, CHI 2021). The failure surface that practitioners actually hit, repeatedly, is the data work that the field systematically under-invests in.

To be accurate about scope: the 43% figure is tied with technology in the Informatica data, not the sole named cause, and not every AI deployment depends on the kind of freshness-and-provenance contract argued for here — a low-risk, static-analytics use case can succeed on imperfect data. The contract earns its keep most strongly in real-time, decision-facing systems, where recency is load-bearing and a wrong answer is acted on before anyone notices it was wrong. That is the scope this page is about.

This is also the natural place to state a pairing that is easy to confuse. The evaluation harness is the model-error gate: it runs offline, before release, on a frozen held-out set, and answers "did this version of the model get worse?" The data-integrity contract is the data-error gate: it runs online, at decision time, on live inputs, and answers "is the data this decision is about to consume real, provenanced, and fresh enough?" The two gates fail independently, and neither covers the other — a flawless eval cannot save you from a six-hour-old input, and a perfect data contract cannot save you from a model that silently regressed. A production AI system that decides needs both, because a confident wrong answer can originate on either side of the model boundary. See the evaluation harness for the model-error side; this page is the data-error side. Both are part of why AI pilots fail.

Why data integrity is the real reliability problem

When teams talk about AI reliability they usually mean the model: hallucination, accuracy, drift, evaluation. That is a real surface, but it is the second-order one. The first-order surface is the data the model and the surrounding system consume. A correct model reasoning over a stale, mocked, or synthetic input produces a wrong answer with full confidence and no error flag — which is the worst possible failure, because it is indistinguishable from a right one until someone acts on it.

This is harder than ordinary data quality because of where AI sits in the stack. A traditional report that draws on stale data looks stale — the timestamp is on the page, a human reads it, judgement intervenes. An AI system collapses that distance. It ingests the input, reasons over it, and emits a decision or a number in a single motion, often with no human between the bad input and the consequence. The model also launders the staleness: it wraps a six-hour-old value in fluent, present-tense, authoritative prose. The very competence that makes the system useful is what makes a bad input dangerous, because it removes the human cue — visible staleness, hedged language — that would otherwise have caught it.

So the contract is not a data-hygiene nicety bolted on at the end. It is a property the system must hold before any decision logic is allowed to run. Stated plainly: no synthetic, mocked, or silently-stale data in anything that informs a decision or reaches a user. Every data source carries a freshness guarantee, and when a source cannot meet it, the system says so explicitly instead of serving a confident wrong answer.

There is a deeper reason existence checks fail, and it is worth naming because it is the technical spine of this whole argument. Data quality is not one axis. The canonical framework — Wang and Strong's Beyond Accuracy: What Data Quality Means to Data Consumers (Journal of Management Information Systems, 1996) — established that data quality decomposes into multiple dimensions, and that practitioners systematically over-index on accuracy while data consumers care just as much about timeliness: quality relative to the task at hand. Timeliness is a named, distinct dimension. A value can be perfectly accurate and worthless because it is stale. An existence check measures completeness, and nothing about timeliness. It is testing a different quality dimension from the one that is load-bearing for the decision.

How stale-as-live failures actually happen

The failure mode is rarely a dramatic outage. It is a value that is structurally present but semantically dead. Consider a generic, non-proprietary example: a service writes the current value of some external signal to a cache every few seconds, and a downstream component reads that cache to make a call. One day the writer dies — a network blip, an expired credential, an upstream rate-limit, a deploy that didn't restart the right process. The cache still holds the last value it ever received. The reader is checking only whether a value exists and is greater than zero. It is. So the reader proceeds, treating a value frozen at the moment the writer died as the live state of the world. Nothing alarms, because every component is technically "up." The dashboard is green. The number is wrong, and it will stay wrong, confidently, until someone independently notices the world has moved and the value hasn't.

Stale data is specifically the dangerous failure mode because it is the one that looks healthy. Unlike a missing or corrupted record, a stale value renders normally — dashboards draw, jobs complete, no errors fire — which is why it is documented as a silent failure that standard error monitoring structurally cannot see (IBM, What Is Stale Data?). Process monitoring answers "did the job run?"; freshness monitoring answers "did fresh data actually arrive?" The gap between those two questions is where stale-as-live incidents live.

There are three recurring shapes here, and they compound:

The mock that outlived the demo. A placeholder or fixture wired in during development to get the pipeline flowing, never removed, now silently feeding a production decision. It survives because it works — it returns plausible data on every call, so nothing ever fails loudly enough to get it noticed.
The cache with no expiry semantics. A value cached for performance, read with an existence check (value != null, value > 0) rather than a freshness check. Existence is not liveness. A value can exist and be hours stale; the read path cannot tell the difference unless freshness is part of the contract.
The liveness/outcome confusion. "The service is up" answers a different question from "fresh, correct data was just produced." A health check that pings a process tells you the process is running, not that it is doing its job. You can have a perfectly live service emitting perfectly dead data. This is the liveness vs. outcome distinction, treated in full on its own page; the data-integrity contract is one reason a liveness probe is not an outcome signal.

In each case the root cause is the same: the system was built to check presence, when the property that matters is recency and provenance. Presence is cheap to verify and reassuring; recency is the thing that was actually load-bearing.

Existence check vs. freshness check: what each one tests and what it misses

This is the distinction at the centre of every stale-as-live failure. The methods below are not wrong — they are testing the wrong dimension. The column that matters most is the failure each method silently allows.

Read pattern	What it actually verifies	The failure it silently allows	The honest alternative
`value != null` (existence)	A value is present in the read path	A mock or fixture that returns plausible data on every call; a cache frozen the instant the writer died	Require provenance: which source produced this, and is it the real one?
`value > 0` (sanity)	A value is present and in a plausible range	A stale-but-plausible value — last good price, yesterday's count — that is in-range and completely wrong now	Attach a timestamp and check it against a per-source freshness budget
"The service is up" (liveness ping)	A process is running	A live service emitting dead data: green health check, stale output	Alarm on "fresh, correct data was produced," not on "the process responded"
"The job ran" (orchestration success)	A pipeline step completed	A job that finished and wrote a stale or empty result; an upstream that silently stopped feeding it	A data-freshness SLO — oldest data no older than Y — checked at read time, not just at job exit
Freshness + provenance check (the contract)	Value, plus origin, plus age within the decision's tolerance window	This is the point: it is built to catch exactly the stale-as-live and surviving-mock failures the rows above pass	When the budget cannot be met, fail closed — return an explicit "data unavailable" and decline to decide

Two of those rows are catching genuinely different problems, and conflating them is a common mistake. Freshness answers "how old?"; provenance answers "from where?" A value can be perfectly fresh — written one second ago — and still be wrong because it came from a mock that is updating happily on a timer. Freshness checks catch the dead-writer and cold-cache cases; provenance checks catch the surviving-mock and wrong-source cases. The contract needs both stamps, age and origin, because they fail independently.

The principle: freshness guarantees and fail-closed by default

The fix has two parts, and both are engineering commitments rather than aspirations.

First, every market-facing or decision-facing value carries explicit provenance and a freshness budget. A read is not "is there a value?" but "is there a value, where did it come from, and was it produced within the window this decision tolerates?" That window is a deliberate, documented choice per data source — sub-second for a fast-moving signal, minutes or hours for something that genuinely changes slowly. The point is that the tolerance is stated and checked, not assumed. A value that exists but falls outside its freshness budget is treated as no value at all.

Second, fail closed. When a source cannot meet its guarantee, the system returns an honest, explicit "data unavailable" — and the decision logic declines to act on absent data — rather than substituting a stale value, a default, or a synthetic stand-in. This is not a contrarian instinct. It is the data-plane instance of a fifty-year-old secure-design principle: fail-safe defaults, from Saltzer and Schroeder's The Protection of Information in Computer Systems (1975), which holds that when a mechanism fails or cannot reach a decision, it should fail to the safe, deny state rather than the permissive one. Applied to data: when a source cannot prove freshness and provenance, the safe default is to deny the value, not to substitute a stand-in.

Most systems are built the other way — to keep producing output under degradation, because an empty screen feels like a failure. But in anything that decides, a confident wrong answer is strictly worse than a visible gap. "I don't currently have a reliable value for this" is a safe state. A six-hour-old number presented as live is not.

This is also where independent verification matters. A multi-component AI system can only be trusted with a number once the data plane beneath it is provable — the model is only ever as honest as its inputs. We are AI-led and model-flexible; orchestrating the components is the mechanism, but the contract is what makes the result defensible. It is deliberately model-neutral, because the failure mode lives in the data path rather than in any one model. (For how that verification fits a delivery model, see how we work.)

The trade-offs, stated honestly

This discipline has real costs, and a contract that hid them would not be worth signing.

Fail-closed means the system will sometimes refuse to answer when a looser design would have guessed — and a guess is right more often than not, which makes the discipline feel pessimistic. That is exactly the trap: a design calibrated by the common case is blind to the expensive tail. The cases where the guess is wrong are exactly the cases that matter, and you cannot tell them apart at the moment of the guess. You are trading a small amount of availability for a large amount of trustworthiness, and in any decision system that is the correct trade.

Fail-closed also produces visible gaps, and a user can read a visible gap as the system being broken — which can raise perceived unreliability even when the system is behaving exactly as designed. That is a real cost, and the answer is not to hide the gap but to make the "data unavailable" state legible: say what is unavailable and why, so the honest state reads as honesty rather than breakage.

Freshness budgets add engineering surface: every source needs a defined tolerance, every write needs to stamp provenance and time, every read needs to enforce the check. This is more code and more discipline than an existence check. It is not free. But it is the cheapest place to pay, because the alternative — discovering a stale-as-live failure after a decision has been made on it — is paid in incidents and in eroded confidence, which is far more expensive and arrives without warning.

How to apply it

You do not need to rebuild a system to adopt the contract. Start where decisions are made and work outward.

Inventory the decision-facing data paths. For each value that informs a decision or reaches a user, ask one question: if its upstream source died right now, how would this code know? If the answer is "it wouldn't," you have found a stale-as-live risk.
Replace existence checks with freshness checks. Anywhere a read tests only for presence, add provenance and a timestamp, and define the tolerance window explicitly. Make the window a reviewed decision, not an accident of caching.
Audit for surviving mocks and fixtures. Search the production path for placeholders, sample data, and development stand-ins. Anything that returns plausible data without touching a real source is a candidate to fail loudly or be removed.
Instrument the outcome, not the ping. Alarm on "fresh, correct data was produced," not on "the process is running." A green health check that coexists with dead output is the signal you most need to catch.
Make "data unavailable" a first-class state, and verify it under real failure. Build the explicit, honest unavailable path deliberately — then prove it triggers under a real, induced source failure, against the running system, checked by someone other than the author. The fail-closed branch is the one path that only fires under failure, so it is the least exercised in normal operation and the most likely to be quietly broken. A fail-closed path you have not watched fail closed is an assumption, not a guarantee. This is what production AI, proven actually means in practice: not a claim that the system is safe, but independent evidence that the safe path fires when it has to.

Every failure in the table above fails the same way: it verifies presence — a value exists, a process responds, a job finished — when the property that is load-bearing for a decision is recency-and-provenance: this specific value, fresh enough for this decision, from the source it claims. A system that checks whether a value exists, but not whether it is still true, is not "mostly reliable" — it is confidently wrong on exactly the inputs that matter, and it cannot tell you which ones those are. That is why the data-integrity contract is the gate to production: it is the discipline that turns "the number is there" into "the number is true, and the system would have said so if it weren't."

If you are putting AI behind real decisions and want a clear-eyed read on where stale-as-live and synthetic-data risks hide in your own stack, we can work through it directly in an AI working session, or as part of longer-term AI consulting. The proof is the fail-closed path firing on demand, watched by someone other than the author — evidence, not assurance.

NewGenApps — production AI, proven. Stay a step ahead, always.

Book an AI working session