Is your data good enough for your machine learning/AI plans?

AI developments are a priority for businesses and governments around the world. Yet one fundamental aspect of AI remains overlooked: poor data quality.

AI algorithms rely on reliable data to generate optimal results. If the data is biased, incomplete, insufficient and inaccurate, it leads to devastating consequences.

AI systems that identify patient illnesses are a great example of how poor data quality can lead to undesirable outcomes. When ingested with insufficient data, these systems produce false diagnoses and inaccurate predictions resulting in misdiagnoses and delayed treatments. For example, a study conducted at the University of Cambridge of over 400 tools used to diagnose Covid-19 found that AI-generated reports were completely unusable, due to faulty datasets.

>

In other words, your AI initiatives will have devastating real-world consequences if your data isn't good enough.

What does "good enough" data mean?

There's quite a debate about what "good enough" data means. Some say there is not good enough data. Others say the need for good data leads to analysis paralysis, while HBR is adamant that your machine learning tools are useless if your information is bad.

At WinPure, we define good enough data as complete, accurate, and valid data that can be used with confidence for business processes with acceptable risks, including the level is subject to individual goals and the circumstances of a business.'

Most organizations struggle more than they admit with data quality and governance. Add to tension; they are overwhelmed and under immense pressure to deploy AI initiatives to stay competitive. Unfortunately, this means that issues such as dirty data aren't even part of board discussions until they cause a project to fail.

How does poor data quality affect AI systems?

Data quality issues arise early in the process when the algorithm feeds on the training data to learn patterns. For example, if an AI algorithm comes with unfiltered social media data, it picks up abuse, racist comments, and misogynistic remarks, as seen with Microsoft's AI bot. Recently, the inability of AI to detect dark-skinned people has also been attributed to partial data.

How does this relate to data quality?

Lack of data governance, lack of awareness of data quality, and isolated views of data (where such gender disparity may have been noticed) lead to poor results.

What to do?

When companies realize they have a data quality problem, they panic to hire. Consultants, engineers and analysts are hired on a blind basis to diagnose, clean data and resolve issues as soon as possible. Unfortunately, months pass before any progress is made, and despite the millions spent on manpower, the problems don't seem to go away. A knee-jerk approach to a data quality problem is of little use.

Real change starts at the grassroots level.

Here are three crucial steps to take if you want your AI/ML project to move in the right direction.

Raise awareness and recognize data quality issues

To get started, assess the quality of your data by building a culture of data literacy. Bill Schmarzo, a powerful voice in the industry, recommends using design thinking to create a culture where everyone understands and can contribute to an organization's data goals and challenges.

In today's business landscape, data and data quality are no longer the sole responsibility of IT or data teams. Business users should be aware of dirty data issues and inconsistent and duplicate data, among other issues.

So the first essential thing to do: make data quality training an organizational effort and empower teams to recognize poor data attributes.

Here is a checklist you can use to start a conversation about your data quality.

Is your data good enough for your machine learning/AI plans?

AI developments are a priority for businesses and governments around the world. Yet one fundamental aspect of AI remains overlooked: poor data quality.

AI algorithms rely on reliable data to generate optimal results. If the data is biased, incomplete, insufficient and inaccurate, it leads to devastating consequences.

AI systems that identify patient illnesses are a great example of how poor data quality can lead to undesirable outcomes. When ingested with insufficient data, these systems produce false diagnoses and inaccurate predictions resulting in misdiagnoses and delayed treatments. For example, a study conducted at the University of Cambridge of over 400 tools used to diagnose Covid-19 found that AI-generated reports were completely unusable, due to faulty datasets.

>

In other words, your AI initiatives will have devastating real-world consequences if your data isn't good enough.

What does "good enough" data mean?

There's quite a debate about what "good enough" data means. Some say there is not good enough data. Others say the need for good data leads to analysis paralysis, while HBR is adamant that your machine learning tools are useless if your information is bad.

At WinPure, we define good enough data as complete, accurate, and valid data that can be used with confidence for business processes with acceptable risks, including the level is subject to individual goals and the circumstances of a business.'

Most organizations struggle more than they admit with data quality and governance. Add to tension; they are overwhelmed and under immense pressure to deploy AI initiatives to stay competitive. Unfortunately, this means that issues such as dirty data aren't even part of board discussions until they cause a project to fail.

How does poor data quality affect AI systems?

Data quality issues arise early in the process when the algorithm feeds on the training data to learn patterns. For example, if an AI algorithm comes with unfiltered social media data, it picks up abuse, racist comments, and misogynistic remarks, as seen with Microsoft's AI bot. Recently, the inability of AI to detect dark-skinned people has also been attributed to partial data.

How does this relate to data quality?

Lack of data governance, lack of awareness of data quality, and isolated views of data (where such gender disparity may have been noticed) lead to poor results.

What to do?

When companies realize they have a data quality problem, they panic to hire. Consultants, engineers and analysts are hired on a blind basis to diagnose, clean data and resolve issues as soon as possible. Unfortunately, months pass before any progress is made, and despite the millions spent on manpower, the problems don't seem to go away. A knee-jerk approach to a data quality problem is of little use.

Real change starts at the grassroots level.

Here are three crucial steps to take if you want your AI/ML project to move in the right direction.

Raise awareness and recognize data quality issues

To get started, assess the quality of your data by building a culture of data literacy. Bill Schmarzo, a powerful voice in the industry, recommends using design thinking to create a culture where everyone understands and can contribute to an organization's data goals and challenges.

In today's business landscape, data and data quality are no longer the sole responsibility of IT or data teams. Business users should be aware of dirty data issues and inconsistent and duplicate data, among other issues.

So the first essential thing to do: make data quality training an organizational effort and empower teams to recognize poor data attributes.

Here is a checklist you can use to start a conversation about your data quality.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow