Meta builds an AI to check facts on Wikipedia

Most people over 30 probably remember researching good old-fashioned encyclopedias. You would pull a large volume off the shelf, check the index for your topic of interest, then return to the appropriate page and start reading. It wasn't as simple as typing a few words into the Google search bar, but on the plus side, you knew that the information you found on the pages of Britannica or World Book was accurate and true.

This is not the case with internet research today. The overwhelming multitude of sources was confusing enough, but add the proliferation of misinformation and it's a wonder any of us believed a word we read online.

Wikipedia is a good example. At the start of 2020, the English version of the site averaged around 255 million page views per day, making it the eighth most visited website on the Internet. Since last month, it has risen to seventh place, and the English version currently has over 6.5 million articles.

But as important as this go-to source of information might be, its accuracy leaves something to be desired; the site's reliability page states: "The online encyclopedia does not consider itself a reliable source and discourages readers from using it in academic or research contexts."

Meta—from the old Facebook—wants to change that. In a blog post published last month, company employees describe how AI could help make Wikipedia more accurate.

Although tens of thousands of people contribute to the editing of the site, the facts they add are not necessarily correct; even when quotes are present, they are not always accurate or even relevant.

Meta develops a machine learning model that analyzes these citations and feeds their content back to Wikipedia articles to verify that not only do the topics line up, but that the specific numbers cited are accurate.

It's not just about picking numbers and making sure they match; Meta's AI will need to "understand" the content of cited sources (although "understand" is a misnomer, as complexity theory researcher Melanie Mitchell would put it, as AI is still in the "narrow" phase , which means it's a highly sophisticated pattern recognition tool, while "comprehension" is a word used for human cognition, which again is a very different thing).

Meta's model will "understand" content not by comparing strings of text and ensuring they contain the same words, but by comparing mathematical representations of blocks of text, which it manages to use natural language understanding (NLU) techniques.

"What we've done is create an index of all these web pages by breaking them down into passages and providing an accurate representation for each passage," said Fabio Petroni, Technical Director of Fundamental Research at AI at Meta, at Digital Trends. “It is not a question of representing the passage word for word, but the meaning of the passage. This means that two pieces of text with similar meanings will be represented in a very close position in the resulting n-dimensional space where all these passages are stored. »

The AI is being trained on a set of four million Wikipedia citations, and in addition to picking out misquotes on the site, its creators would like it to possibly suggest accurate sources to take their place, drawing on a massive index of constantly updated data.

A big problem that remains to be solved is working within a rating system for the reliability of sources. An article from a scientific journal, for example, would receive a higher rating than a blog post. The amount of content online is so vast and varied that you can find "sources" to back up just about any claim, but by analyzing the misinformation of misinformation (the former means incorrect, while the latter means deliberate misleading) and the peer-reviewed of what hasn't been peer-reviewed, of what has been checked in a hurry, that's no small task, but it's a very important task when it comes to confidence.

Meta has opened its model, and those who are curious can see a demo of the verification tool. Meta's blog post noted that the company is not partnering with Wikimedia on this project, and that it is still in the research phase and not current...

Technology Aug 27, 2022 0 58 Add to Reading List

Meta builds an AI to check facts on Wikipedia

Most people over 30 probably remember researching good old-fashioned encyclopedias. You would pull a large volume off the shelf, check the index for your topic of interest, then return to the appropriate page and start reading. It wasn't as simple as typing a few words into the Google search bar, but on the plus side, you knew that the information you found on the pages of Britannica or World Book was accurate and true.

This is not the case with internet research today. The overwhelming multitude of sources was confusing enough, but add the proliferation of misinformation and it's a wonder any of us believed a word we read online.

Wikipedia is a good example. At the start of 2020, the English version of the site averaged around 255 million page views per day, making it the eighth most visited website on the Internet. Since last month, it has risen to seventh place, and the English version currently has over 6.5 million articles.

But as important as this go-to source of information might be, its accuracy leaves something to be desired; the site's reliability page states: "The online encyclopedia does not consider itself a reliable source and discourages readers from using it in academic or research contexts."

Meta—from the old Facebook—wants to change that. In a blog post published last month, company employees describe how AI could help make Wikipedia more accurate.

Although tens of thousands of people contribute to the editing of the site, the facts they add are not necessarily correct; even when quotes are present, they are not always accurate or even relevant.

Meta develops a machine learning model that analyzes these citations and feeds their content back to Wikipedia articles to verify that not only do the topics line up, but that the specific numbers cited are accurate.

It's not just about picking numbers and making sure they match; Meta's AI will need to "understand" the content of cited sources (although "understand" is a misnomer, as complexity theory researcher Melanie Mitchell would put it, as AI is still in the "narrow" phase , which means it's a highly sophisticated pattern recognition tool, while "comprehension" is a word used for human cognition, which again is a very different thing).

Meta's model will "understand" content not by comparing strings of text and ensuring they contain the same words, but by comparing mathematical representations of blocks of text, which it manages to use natural language understanding (NLU) techniques.

"What we've done is create an index of all these web pages by breaking them down into passages and providing an accurate representation for each passage," said Fabio Petroni, Technical Director of Fundamental Research at AI at Meta, at Digital Trends. “It is not a question of representing the passage word for word, but the meaning of the passage. This means that two pieces of text with similar meanings will be represented in a very close position in the resulting n-dimensional space where all these passages are stored. »

The AI is being trained on a set of four million Wikipedia citations, and in addition to picking out misquotes on the site, its creators would like it to possibly suggest accurate sources to take their place, drawing on a massive index of constantly updated data.

A big problem that remains to be solved is working within a rating system for the reliability of sources. An article from a scientific journal, for example, would receive a higher rating than a blog post. The amount of content online is so vast and varied that you can find "sources" to back up just about any claim, but by analyzing the misinformation of misinformation (the former means incorrect, while the latter means deliberate misleading) and the peer-reviewed of what hasn't been peer-reviewed, of what has been checked in a hurry, that's no small task, but it's a very important task when it comes to confidence.

Meta has opened its model, and those who are curious can see a demo of the verification tool. Meta's blog post noted that the company is not partnering with Wikimedia on this project, and that it is still in the research phase and not current...