Did data drift in AI models cause Equifax's credit score problem?

Couldn't attend Transform 2022? Check out all the summit sessions in our on-demand library now! Look here.

Earlier this year, from March 17 to April 6, 2022, credit reporting agency Equifax encountered an issue with its systems that resulted in incorrect credit reports for consumers.

The issue has been described by Equifax as a "coding issue" and has led to legal action and a class action lawsuit against the company. There has been speculation that the problem is somehow related to the company's AI systems that help calculate credit scores. Equifax did not respond to a request for comment on the VentureBeat issue.

"When it comes to Equifax, there's no shortage of accusations," Thomas Robinson, vice president of strategic partnerships and corporate development at Domino Data Lab, told VentureBeat. "But from an AI perspective, what went wrong seems like a classic problem, errors were made in the data feeding the machine learning model."

Robinson added that the errors could come from a number of different situations, including tags that weren't updated correctly, data that was manually ingested incorrectly from source or an inaccurate data source.
Event
MetaBeat 2022

MetaBeat will bring together thought leaders to advise on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
register here
Another possibility raised by Krishna Gade, co-founder and CEO of Fiddler AI, is a phenomenon known as data drift. Gade noted that according to reports, credit scores sometimes shifted by 20 points or more in either direction, enough to change the interest rates offered to consumers or cause their applications to be rejected altogether.

Gade explained that data drift can be defined as unexpected and undocumented changes in the structure, semantics, and distribution of data in a model.

He noted that drift can be caused by changes in the world, changes in product usage, or data integrity issues, such as bugs and degraded application performance. Data integrity issues can arise at any stage of a product's pipeline. Gade commented that, for example, a bug in the front-end could allow a user to enter data in the wrong format and skew the results. Alternatively, a bug in the backend may affect how this data is transformed or loaded into the model.

Data drift is also not an entirely uncommon phenomenon.

“We believe this happened in the case of the Zillow incident, where they failed to accurately predict real estate prices and ended up investing hundreds of millions of dollars ", Gade told VentureBeat.

Gade explained that, from his perspective, data drift incidents happen because, implicitly, in the machine learning process of building datasets, training models, and evaluation of models, there is the assumption that the future will be the same as the past.

"In effect, ML algorithms look in the past for patterns that may generalize in the future," Gade said. "But the future is subject to constant change, and the accuracy of production models may deteriorate over time due to data drift."

Gade suggests that if an organization notices data drift, a good place to start remediation...

Business Aug 17, 2022 0 82 Add to Reading List

Did data drift in AI models cause Equifax's credit score problem?

Couldn't attend Transform 2022? Check out all the summit sessions in our on-demand library now! Look here.

Earlier this year, from March 17 to April 6, 2022, credit reporting agency Equifax encountered an issue with its systems that resulted in incorrect credit reports for consumers.

The issue has been described by Equifax as a "coding issue" and has led to legal action and a class action lawsuit against the company. There has been speculation that the problem is somehow related to the company's AI systems that help calculate credit scores. Equifax did not respond to a request for comment on the VentureBeat issue.

"When it comes to Equifax, there's no shortage of accusations," Thomas Robinson, vice president of strategic partnerships and corporate development at Domino Data Lab, told VentureBeat. "But from an AI perspective, what went wrong seems like a classic problem, errors were made in the data feeding the machine learning model."

Robinson added that the errors could come from a number of different situations, including tags that weren't updated correctly, data that was manually ingested incorrectly from source or an inaccurate data source.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to advise on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Another possibility raised by Krishna Gade, co-founder and CEO of Fiddler AI, is a phenomenon known as data drift. Gade noted that according to reports, credit scores sometimes shifted by 20 points or more in either direction, enough to change the interest rates offered to consumers or cause their applications to be rejected altogether.

Gade explained that data drift can be defined as unexpected and undocumented changes in the structure, semantics, and distribution of data in a model.

He noted that drift can be caused by changes in the world, changes in product usage, or data integrity issues, such as bugs and degraded application performance. Data integrity issues can arise at any stage of a product's pipeline. Gade commented that, for example, a bug in the front-end could allow a user to enter data in the wrong format and skew the results. Alternatively, a bug in the backend may affect how this data is transformed or loaded into the model.

Data drift is also not an entirely uncommon phenomenon.

“We believe this happened in the case of the Zillow incident, where they failed to accurately predict real estate prices and ended up investing hundreds of millions of dollars ", Gade told VentureBeat.

Gade explained that, from his perspective, data drift incidents happen because, implicitly, in the machine learning process of building datasets, training models, and evaluation of models, there is the assumption that the future will be the same as the past.

"In effect, ML algorithms look in the past for patterns that may generalize in the future," Gade said. "But the future is subject to constant change, and the accuracy of production models may deteriorate over time due to data drift."

Gade suggests that if an organization notices data drift, a good place to start remediation...