When AI Exposes the Cracks in Enterprise Data

AI has a way of revealing what organizations already have in place. When results surprise leadership teams, the instinct is often to question the model. In reality, AI is doing exactly what it was trained to do: learn from the data it was given and apply it with precision. When that data is incomplete or inconsistent, the outcomes reflect it. This dynamic surfaces everywhere.

Last fall, Anthropic showed how even a tiny percentage of corrupted data can distort a model’s results at scale. We see the same pattern in spend and supplier data every day. For example, a multinational company could realize that a single misclassified supplier caused weeks of inaccurate risk scoring. The AI model didn’t hallucinate; it internalized an embedded error in the organization’s own records.

Many enterprises assume that more data will smooth out imperfections. In practice, volume can hide problems. AI does not average its way to accuracy; it reinforces whatever patterns it sees. The impact shows up where accuracy matters most. Forecasts built on stale inputs misrepresent future exposure. Risk signals disappear, and automated recommendations drift, because years of classification changes and workarounds are baked into the data.

The cost of building AI on fragile foundations

A common misconception is that AI projects fail because people resist new technology. What actually fails is trust. A user sees an output that contradicts a basic expectation, and confidence in the system drops immediately. Once that trust erodes, adoption falls off long before the technology has the opportunity to demonstrate real value.

There is another side to trust that often gets overlooked. Sometimes the output is correct but it doesn’t match the user’s expectations. In those cases, trust breaks not because the system is wrong, but because it is revealing something unfamiliar.

Trust is rebuilt through transparency. Users need to understand how a result was produced, what data contributed to it, and why it looks the way it does. Without that visibility, even accurate insights are dismissed. With it, AI becomes a system people can learn from and improve over time.

A common misconception is that AI projects fail because people resist new technology. What actually fails is trust.

This erosion often starts with the sequence of AI investment. Many organizations begin with the model’s capabilities, interface and potential use cases, and assume the data can be addressed later. But the order is wrong. By the time leaders turn their attention to data structure, lineage, refresh cycles or classification consistency, the model has already internalized whatever information was available. When the results fall short, attention turns to the model rather than the inputs that shaped it.

A structural reason reinforces this pattern: Data ownership is spread across procurement, finance, IT and other business units, each with their own taxonomies, systems and definitions. Silent divergence accumulates over time. The model becomes the first entity in the organization to interact with all that variation at once, and it exposes every inconsistency instantly.

A more reliable approach to data and AI

There is a more dependable way to approach AI, and it starts with basic operational discipline. In supply chains, organizations validate inputs, check for contamination and trace origins before anything downstream depends on them. The logic is that if the upstream data is wrong, every decision that follows inherits the error.

Data pipelines deserve the same level of attention. If AI systems are going to inform or automate decisions, the information feeding them must be accurate, consistent and refreshed at a cadence that matches the decisions they support.

The practices are not complicated. Organizations need clarity on data provenance: how information is created, how it changes over time, and which systems influence it. They need consistent classification and structure so similar items are treated the same way across teams and regions. They need refresh cycles that match operational reality; quarterly updates cannot support weekly planning. And they need periodic checks to confirm that outputs still align with verified facts, because even well-designed pipelines drift.

A supplier name that appears in three systems, spelled slightly differently in each one … a small inconsistency becomes a structural blind spot.

A common procurement example illustrates how easily issues scale: a supplier name that appears in three systems, spelled slightly differently in each one. A human understands they refer to the same entity. A model learning patterns across millions of rows does not. Downstream systems treat them as unrelated suppliers, and a small inconsistency becomes a structural blind spot.

As AI moves deeper into planning, budgeting, supplier evaluation, and other sensitive areas, the margin for error shrinks. The systems being built today will make decisions at speeds that outpace manual review.

A clearer standard for AI readiness

This is why improving data integrity is becoming the defining challenge of enterprise AI — and the biggest competitive divide of the next five years.

As organizations explore agent-based and automated workflows, models will rely on each other’s outputs. A small inaccuracy at the beginning of the chain can influence dozens of downstream decisions. Leaders who treat data governance as a compliance exercise will struggle. The ones who succeed will recognize it as an operational requirement.

AI is powerful, but it cannot rise above the information beneath it. Every organization now faces a choice: Treat data integrity as infrastructure or treat AI as a gamble. The companies that take the first path will build systems they can trust and scale. The companies that take the second will continue to blame the model for problems rooted in their own pipelines.

Author

Pierre Laprée

Pierre Laprée is the CPO of SpendHQ, a spend analytics and procurement company.

View all posts