Starting AI detection GPTZero everything scanned 4,841 items accepted by the prestigious conference on Neural Information Processing Systems (NeurIPS), which took place last month in San Diego. The company found 100 wild quotes in 51 articles that it confirmed were false, the company told TechCrunch.
Having a paper accepted by NeurIPS is a CV-worthy achievement in the world of AI. Given that these are the leading minds in AI research, one would assume that they would use LLMs for the catastrophically boring task of writing citations.
Caveats therefore abound regarding this discovery: 100 hallucinatory citations confirmed in 51 articles are not statistically significant. Each article has dozens of quotes. So out of tens of thousands of citations, that’s statistically zero.
It is also important to note that an inaccurate citation does not invalidate the research of the article. As NeurIPS said Fortunewho first reported on GPTZero’s research, “Even though 1.1% of articles have one or more incorrect references due to the use of LLM, the content of the articles themselves [is] not necessarily invalidated.
But that said, a fake quote is no small thing either. NeurIPS prides itself on its “rigorous scientific publications in machine learning and artificial intelligence.” it says. And each article is evaluated by several people responsible for reporting hallucinations.
Citations are also a kind of currency for researchers. They are used as a career metric to show the influence of a researcher’s work among their peers. When AI invents them, it attenuates their value.
No one can blame peer reviewers for not detecting a few AI-fabricated quotes given the sheer volume involved. GPTZero is also quick to point this out. The goal of the exercise was to provide specific data on how AI debris is seeping in via “a tsunami of submissions” that has “strained the review pipelines of these conferences to the breaking point,” the the startup says in its report. GPTZero even points to a May 2025 article titled “The AI Conference Peer Review Crisis” who discussed the problem at early conferences, including NeurIPS.
Techcrunch event
San Francisco | October 13-15, 2026
Yet why couldn’t the researchers themselves verify the accuracy of the LLM’s work? Surely they must know the actual list of papers they used for their work.
What all this really highlights is an important and ironic conclusion: if the world’s leading AI experts, with their reputations on the line, can’t guarantee that their use of LLM is accurate in detail, what does that mean for the rest of us?























