How AI will collide with data preparation

how-ai-will-collide-with-data-preparation

How AI will collide with data preparation

In recent years, the hype around generative AI tools and agentic AI has convinced many executives to invest big and jump headfirst into the latest technological advancements without necessarily considering the bigger picture.

Now that projects are moving from pilot to full production, I expect many of these companies will start to realize that their data is far from AI-ready.

EMEA field CDO at Confluent.

In many cases, the limitations have little to do with the AI ​​itself. Instead, they come from fragmented data, disconnected systems, and foundations that were never designed to support automated decision-making or data shared and acted upon in real time.

Article continues below

As AI becomes more and more integrated into daily operations, these weaknesses are no longer easy to circumvent and they have a direct impact on whether AI delivers value or simply creates cost complexity on top of existing systems.

When AI capabilities exceed data infrastructure

This can be seen in the way AI is deployed in many organizations, particularly with conversational front ends. They are introduced quickly, often with the aim of reducing friction or improving efficiency.

However, behind the interface, the captured data does not always flow correctly into the systems that run the business. In some cases the data is duplicated and in others it is incomplete or out of sync with existing records.

This leads to AI introducing additional work rather than removing it, with employees spending time checking results or correcting errors from elsewhere in the system.

Sign up for the TechRadar Pro newsletter to get all the top news, opinions, features and tips your business needs to succeed!

While this might have been manageable in a pilot project, as AI becomes more entrenched in daily operations, these problems become harder to contain – and much more expensive.

A clear example of this has been seen in recent AI-based GP appointment systems. These tools appear effective on the surface, helping patients navigate booking processes more easily, but behind the scenes, context and up-to-date patient information is not always correctly transmitted to the GP back-end systems that clinicians rely on.

Not only does this cause all sorts of problems with data duplication and repeated workloads for GPs, but it also creates frustrations for the very people the systems were designed to support.

This is a classic case of organizations adopting intelligent AI front-ends without effectively integrating them with existing back-end data and systems, or without adopting the operational processes necessary to fully realize the value.

Instead of chasing AI features, businesses should start with the outcomes they actually want and work backward from there. This means focusing on clean, reliable data, with full visibility into its lifecycle and traceability, and ensuring it can be acted upon in real time.

From Big Data to adapted data

For a long time, data strategy has focused on scale. The priority was to collect as much information as possible and store it inexpensively, with the assumption that value could be extracted later.

This approach starts to break down once AI is involved, because it relies on current, consistent data, not hours or days. Old, outdated or unvalidated records (like old contact details or incomplete customer histories) undermine the accuracy and confidence in AI results.

To achieve meaningful results, businesses must prioritize data tracking, governance, and context, as well as the speed with which that data can be accessed and used.

Generally, improving data quality and integration is often considered a difficult and expensive task, especially when legacy systems are involved. As a result, many organizations are postponing it in favor of more visible AI initiatives.

However, in practice, this delay generally generates higher costs over time. Teams are increasingly putting more effort into reconciling data, correcting errors, and explaining inconsistencies in AI-generated results.

Opportunity cost is harder to measure but just as important. When AI cannot be trusted to perform reliably, it remains limited to narrow use cases – and without high-quality databases, even the most advanced AI initiatives will fail.

What will change in 2026

By 2026, many organizations will reach a point where improving data quality and integration will no longer be optional if AI is to deliver meaningful results.

For organizations that want AI to add real value, the focus needs to move away from flashy features and focus on the fundamentals. This starts with being clear about the outcomes that AI is intended to support and working backwards through the data needed to achieve them, including how that data is captured, processed and shared in real time.

Data quality, integration and visibility across systems should be treated as core operational concerns rather than a technical cleanup job. Equally important, ownership of AI initiatives must be clear.

When responsibility is shared or vague, data and process issues are easier to ignore: aligning management, IT teams and front-line staff is essential.

As AI becomes more common in the business world over the next year, those who fail to strengthen their data practices risk ending up with AI that looks impressive on the surface, but offers little value.

We have featured the best AI website builder.

This article was produced as part of TechRadarPro’s Expert Insights channel, where we feature the best and brightest minds in today’s technology industry. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you would like to contribute, find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Exit mobile version