• About
  • Advertise
  • Privacy & Policy
  • Contact
Vidianews
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    brandi-glanville-provides-updates-on-vulnerable-health

    Brandi Glanville provides updates on vulnerable health

    sadie-sink-stuns-tom-holland-with-spider-man-confession

    Sadie Sink stuns Tom Holland with Spider-Man confession

    netflix’s-new-horror-game-leverages-your-phone-for-deeper-immersion

    Netflix’s new horror game leverages your phone for deeper immersion

    two-more-dead-ducks-amid-reflecting-pool-controversy

    Two More Dead Ducks Amid Reflecting Pool Controversy

    austin-metcalf’s-father-slams-‘the-view’-host-sunny-hostin-over-karmelo-anthony-affair

    Austin Metcalf’s Father Slams ‘The View’ Host Sunny Hostin Over Karmelo Anthony Affair

    marvel’s-new-official-jean-gray-is-already-turning-heads

    Marvel’s new official Jean Gray is already turning heads

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    meta-launches-new,-cheaper-smart-glasses-under-its-own-brand

    Meta launches new, cheaper smart glasses under its own brand

    a-fitbit-air-user-discovered-the-hard-way-that-the-tracker-didn’t-work.

    A Fitbit Air user discovered the hard way that the tracker didn’t work.

    gears-of-war:-the-creative-director-of-the-e-day-studio-talks-about-the-game

    Gears of War: the creative director of the E-Day studio talks about the game

    prime-day-live:-we’ve-rounded-up-the-59-best-deals-worth-buying

    Prime Day Live: We’ve rounded up the 59 best deals worth buying

    best-tablets-in-2026:-top-picks-from-apple,-samsung-and-amazon

    Best tablets in 2026: Top picks from Apple, Samsung and Amazon

    meta’s-smart-glasses-are-on-sale-today-for-$299

    Meta’s smart glasses are on sale today for $299

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    abbvie’s-$11b-acquisition-of-apogee-brings-advantage-to-eczema-drug-–-medcity-news

    AbbVie’s $11B Acquisition of Apogee Brings Advantage to Eczema Drug – MedCity News

    summer-welcome-prayer:-daily-prayer-for-june-23

    Summer Welcome Prayer: Daily Prayer for June 23

    vegetarian-pasta-recipes-that-will-save-you-every-weeknight-this-summer

    Vegetarian Pasta Recipes That Will Save You Every Weeknight This Summer

    how-to-store-carrots-to-last-up-to-a-month-|-live-better

    How To Store Carrots To Last Up To A Month | Live Better

    renting-a-yacht-in-greece:-what-no-one-tells-you-before-signing

    Renting a yacht in Greece: what no one tells you before signing

    how-is-mayo-clinic-using-ai-in-its-revenue-cycle?-–-medcity-news

    How is Mayo Clinic using AI in its revenue cycle? – MedCity News

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    supreme-court-rejects-falun-gong-lawsuit-accusing-cisco-of-helping-china-persecute-religious-movement

    Supreme Court Rejects Falun Gong Lawsuit Accusing Cisco Of Helping China Persecute Religious Movement

    World Cup: France qualifies for the round of 16 with victory against Iraq despite a two-hour weather delay

    Apollo curbs withdrawals after exit requests hit 17%, reigniting fears over private credit liquidity

    expert-analysis-concerns-lionel-messi’s-sublime-performance

    Expert analysis concerns Lionel Messi’s sublime performance

    Ransom note claims Nancy Guthrie died after kidnapping

    Investigation ordered after building fire that killed 15 in northern Indian city

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    Facebook’s Dream Hire, Former British Deputy Prime Minister Nick Clegg, Gets Off to a Bad Start

    The iPhone Ultra is expected to launch in a white color; May feature vapor chamber cooling

    Elon Musk scaled back his dreams of ending climate change

    Apple’s Ray-Ban Meta Rivaling smart glasses reportedly delayed until next year; Vision Air will launch in 2029

    US-China trade war turns into tech war

    Oura Ring 4 Review: An Always-On Solution for Effective Health Monitoring

No Result
View All Result
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    brandi-glanville-provides-updates-on-vulnerable-health

    Brandi Glanville provides updates on vulnerable health

    sadie-sink-stuns-tom-holland-with-spider-man-confession

    Sadie Sink stuns Tom Holland with Spider-Man confession

    netflix’s-new-horror-game-leverages-your-phone-for-deeper-immersion

    Netflix’s new horror game leverages your phone for deeper immersion

    two-more-dead-ducks-amid-reflecting-pool-controversy

    Two More Dead Ducks Amid Reflecting Pool Controversy

    austin-metcalf’s-father-slams-‘the-view’-host-sunny-hostin-over-karmelo-anthony-affair

    Austin Metcalf’s Father Slams ‘The View’ Host Sunny Hostin Over Karmelo Anthony Affair

    marvel’s-new-official-jean-gray-is-already-turning-heads

    Marvel’s new official Jean Gray is already turning heads

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    meta-launches-new,-cheaper-smart-glasses-under-its-own-brand

    Meta launches new, cheaper smart glasses under its own brand

    a-fitbit-air-user-discovered-the-hard-way-that-the-tracker-didn’t-work.

    A Fitbit Air user discovered the hard way that the tracker didn’t work.

    gears-of-war:-the-creative-director-of-the-e-day-studio-talks-about-the-game

    Gears of War: the creative director of the E-Day studio talks about the game

    prime-day-live:-we’ve-rounded-up-the-59-best-deals-worth-buying

    Prime Day Live: We’ve rounded up the 59 best deals worth buying

    best-tablets-in-2026:-top-picks-from-apple,-samsung-and-amazon

    Best tablets in 2026: Top picks from Apple, Samsung and Amazon

    meta’s-smart-glasses-are-on-sale-today-for-$299

    Meta’s smart glasses are on sale today for $299

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    abbvie’s-$11b-acquisition-of-apogee-brings-advantage-to-eczema-drug-–-medcity-news

    AbbVie’s $11B Acquisition of Apogee Brings Advantage to Eczema Drug – MedCity News

    summer-welcome-prayer:-daily-prayer-for-june-23

    Summer Welcome Prayer: Daily Prayer for June 23

    vegetarian-pasta-recipes-that-will-save-you-every-weeknight-this-summer

    Vegetarian Pasta Recipes That Will Save You Every Weeknight This Summer

    how-to-store-carrots-to-last-up-to-a-month-|-live-better

    How To Store Carrots To Last Up To A Month | Live Better

    renting-a-yacht-in-greece:-what-no-one-tells-you-before-signing

    Renting a yacht in Greece: what no one tells you before signing

    how-is-mayo-clinic-using-ai-in-its-revenue-cycle?-–-medcity-news

    How is Mayo Clinic using AI in its revenue cycle? – MedCity News

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    supreme-court-rejects-falun-gong-lawsuit-accusing-cisco-of-helping-china-persecute-religious-movement

    Supreme Court Rejects Falun Gong Lawsuit Accusing Cisco Of Helping China Persecute Religious Movement

    World Cup: France qualifies for the round of 16 with victory against Iraq despite a two-hour weather delay

    Apollo curbs withdrawals after exit requests hit 17%, reigniting fears over private credit liquidity

    expert-analysis-concerns-lionel-messi’s-sublime-performance

    Expert analysis concerns Lionel Messi’s sublime performance

    Ransom note claims Nancy Guthrie died after kidnapping

    Investigation ordered after building fire that killed 15 in northern Indian city

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    Facebook’s Dream Hire, Former British Deputy Prime Minister Nick Clegg, Gets Off to a Bad Start

    The iPhone Ultra is expected to launch in a white color; May feature vapor chamber cooling

    Elon Musk scaled back his dreams of ending climate change

    Apple’s Ray-Ban Meta Rivaling smart glasses reportedly delayed until next year; Vision Air will launch in 2029

    US-China trade war turns into tech war

    Oura Ring 4 Review: An Always-On Solution for Effective Health Monitoring

No Result
View All Result
Vidianews
No Result
View All Result
Home General

As AI continues to improve, mathematicians struggle to predict their own future

Julie Bort by Julie Bort
March 16, 2026
in General, World
0
as-ai-continues-to-improve,-mathematicians-struggle-to-predict-their-own-future

As AI continues to improve, mathematicians struggle to predict their own future

0
SHARES
3
VIEWS
Share on FacebookShare on Twitter

In the ongoing campaign by artificial intelligence companies to capture pure mathematics, a new cycle is beginning.

The team behind First Proof, an effort to assess the ability of large language models (LLMs) to contribute to research-level mathematics, has announced its upcoming review. For this second round, which it plans to roll out over the coming months, the team is demanding access and transparency from any AI company wishing to participate.

This occurs against a backdrop of radical change in mathematics research. In just the last few months, the best publicly available models have begun to generate valid proofs of minor theorems that are actually useful to working mathematicians. For some experts, the first round of First Proof was a pivotal moment in this ongoing story.


On supporting science journalism

If you enjoy this article, please consider supporting our award-winning journalism by subscribe. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


“We were very impressed with the performance of the AI ​​models,” says Lauren Williams, a Harvard University mathematician and member of the First Proof team. “The problems we proposed are really at the forefront of what AI models, perhaps in collaboration with experts, can solve.”

First Proof was born from its 11-person team’s eye-opening, if sometimes frustrating, experiences with AI. No pre-existing benchmark seemed sufficient to test LLMs as a mathematician’s assistant. In principle, an LLM could save time by proving smaller “lemmas” – intermediate propositions on a mathematician’s path to developing larger, more interesting theorems. In practice, however, these AI assists have tended to go awry.

So for their initial “experimental” test, the First Proof team chose 10 lemmas from papers members had written but not yet published, then set a one-week deadline for AI companies (and anyone else) to try to prove these propositions using their favorite models.

Groups from OpenAI and Google have published their LLMs’ answers to all the problems. Five of the OpenAI model proofs appeared correct. And Google Deepmind’s agent Aletheia seems to have obtained six (even if experts are not unanimous on the validity of any of these proofs). Comparing the performance of the two models, Williams was surprised to find that each solved several problems that the other could not. “It’s interesting to see that their abilities are different,” she says.

“The performance was better than I expected,” says Daniel Litt, a mathematician at the University of Toronto who is not directly involved in the First Proof effort. In total, no fewer than eight out of ten problems appear to have been at least partially solved by AI. “It’s clear that capabilities have improved very quickly,” says Litt.

A future unclear but full of hope

Litt isn’t afraid of AI’s growing mathematical prowess. “I don’t expect that in five years it will be useless,” he says. “In fact, I expect to do the best work I’ve ever done because I’ll have these incredible tools.” In fact, the results of the first proof inspired him to write an essaywhich has circulated widely among mathematicians in recent weeks. It presents a speculative and optimistic view of the AI-infused future of the field.

For the sake of argument, Litt imagines a hypothetical library generated by superintelligent AIs and containing all possible proofs in the mathematical universe. A simple human mathematician wandering among its innumerable shelves could browse all its volumes but could not create any new proofs himself.

But that doesn’t mean mathematicians would be paralyzed by boredom, Litt says. Far from it. “They would be incredibly excited and get to work right away,” he wrote in the essay. The mathematical universe is so vast, he says, that the joy lies in exploring it, whether reading and digesting a proof or writing a new one. “My job wouldn’t even change at all,” he says. “The job now is to try to figure things out.”

Even if all mathematicians agreed with Litt’s decidedly utopian vision of this thought experiment, the current situation falls far short of this lofty ideal, as evidenced by the first round of First Proof. “Together, the models solved maybe eight of the problems,” he says. “But they also produced thousands and thousands of pages of garbage.”

It turns out that current AIs are often fake but convincing. They will cite a result in the literature but claim it is stronger than it is. Or they’ll bury a crucial error deep in a tedious calculation, where it’s easy to miss. “Students make mistakes, but this is definitely not the case. while trying make mistakes,” Litt says. “Models aren’t very honest.”

This qualitative difference in the types of quantitative errors produced by LLMs can make it very difficult to evaluate their responses. “One of the things we learned from this first round is how difficult it can be to verify the accuracy of the results,” says Mohammed Abouzaid, a member of the First Proof team and a mathematician at Stanford University. “You would almost say, ‘No human who knows what all these words mean would make this mistake!’ » »

For the second round, the team plans to give the task of evaluating each application to mathematicians hired as anonymous evaluators, funded by a combination of grants and donations from AI companies. But with no sign of slowing the massive mathematical assault, a deluge of subtly false proofs written in LLMs could soon overwhelm human resources. “People need to start thinking about it,” Litt said. “Our institutions and the profession are not adapting to what is coming. »

An unexplained gap

The first round seemingly revealed a glaring chasm between public and private efforts. This would seem to challenge the idea that AI usurping human skills would democratize them, for example by expanding the number of people able to contribute meaningfully to the progress of mathematics.

In the team’s internal testing before releasing the first round’s 10 lemmas, even the best publicly available models were only able to prove two. During the week-long testing period, various groups of amateur and professional mathematicians attempted to do better by building “scaffolds,” collaborative networks of LLMs that talked to each other to detect errors. But all these efforts only solved one more problem.

Several different factors could explain why Google and OpenAI managed to solve (at least partially) eight problems compared to the public’s three. Companies could use improved, novel versions of their LLMs or more robust internal scaffolding. Or the answers could rely on undisclosed contributions from human mathematicians. (the Google team published an explanation of its methodology. The team said this approach included “absolutely no human intervention” – the kind of claim that First Proof’s new requirements would verify in the second round.)

That’s what the second round is supposed to solve, Williams says. “This was an experiment,” she says, “to get community feedback to determine how to run a more formal cycle.”

In addition to more robust human judgment, this round will require participants to package models so that the First Proof team can prompt them directly. “If it’s not a public model, then we have to run it,” says Abouzaid, “because otherwise it’s not clear what we’re testing.”

It remains to be seen whether OpenAI and Google will comply, or whether the many other LLM companies and math AI start-ups that were conspicuously absent in the first round will do so.

In the months to come, First Proof and other AI benchmarks could help predict the still-unclear fate of mathematics – a small niche in the scientific world that some of the richest eyes on Earth are suddenly turning to.

“One of our main motivations is to be able to tell young people what the field will look like in a few years,” explains Abouzaid. “And that requires understanding what these systems are actually capable of.”

Related

Julie Bort

Julie Bort

Stay Connected

  • 99 Subscribers
  • Trending
  • Comments
  • Latest
european-markets-in-mixed-territory-after-a-positive-start

European markets in mixed territory after a positive start

January 26, 2026
how-to-remove-blood-from-clothes:-what-actually-works-|-live-better

How To Remove Blood From Clothes: What Actually Works | Live Better

April 17, 2026
12-sweet-feminine-aesthetic-outfits-for-the-summer-season

12 Sweet Feminine Aesthetic Outfits for the Summer Season

March 13, 2026
how-to-remove-grease-from-clothes:-4-tested-methods-|-live-better

How To Remove Grease From Clothes: 4 Tested Methods | Live Better

April 18, 2026
hansmaker-presents-the-d1-ultra:-a-dual-laser-engraver-designed-for-each-material-–-techenger

Hansmaker presents the D1 Ultra: a dual laser engraver designed for each material – Techenger

0
nascar-driver-denny-hamlin-breaks-silence-after-father-dies-in-house-fire

NASCAR driver Denny Hamlin breaks silence after father dies in house fire

0
fivio-foreign-checks-himself-into-a-$10,000-rehab-center-to-get-his-mind-straight

Fivio Foreign checks himself into a $10,000 rehab center to get his mind straight

0
david-beckham-leaves-brooklyn-for-his-2025-instagram-tribute-amid-family-feud

David Beckham leaves Brooklyn for his 2025 Instagram tribute amid family feud

0
why-did-ronaldo-say-“i’m-back”-after-portugal-won?-:-“so-that-people-don’t-forget”

Why did Ronaldo say “I’m back” after Portugal won? : “So that people don’t forget”

June 23, 2026
supreme-court-rejects-falun-gong-lawsuit-accusing-cisco-of-helping-china-persecute-religious-movement

Supreme Court Rejects Falun Gong Lawsuit Accusing Cisco Of Helping China Persecute Religious Movement

June 23, 2026
the-‘little-brain’-could-give-a-big-boost-to-the-aging-mind

The ‘little brain’ could give a big boost to the aging mind

June 23, 2026
even-“safe”-air-pollution-levels-can-affect-heart-health

Even “safe” air pollution levels can affect heart health

June 23, 2026

Recent News

why-did-ronaldo-say-“i’m-back”-after-portugal-won?-:-“so-that-people-don’t-forget”

Why did Ronaldo say “I’m back” after Portugal won? : “So that people don’t forget”

June 23, 2026
supreme-court-rejects-falun-gong-lawsuit-accusing-cisco-of-helping-china-persecute-religious-movement

Supreme Court Rejects Falun Gong Lawsuit Accusing Cisco Of Helping China Persecute Religious Movement

June 23, 2026
the-‘little-brain’-could-give-a-big-boost-to-the-aging-mind

The ‘little brain’ could give a big boost to the aging mind

June 23, 2026
even-“safe”-air-pollution-levels-can-affect-heart-health

Even “safe” air pollution levels can affect heart health

June 23, 2026
Vidianews

Trusted news coverage delivering accurate reporting, breaking headlines, and insightful analysis on global events, business, politics, and tech.

Follow Us

Browse by Category

  • Business
  • Entertainment
  • Faith
  • Gadget
  • Gaming
  • General
  • Health
  • Lifestyle
  • Movie
  • News
  • Politics
  • Review
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

why-did-ronaldo-say-“i’m-back”-after-portugal-won?-:-“so-that-people-don’t-forget”

Why did Ronaldo say “I’m back” after Portugal won? : “So that people don’t forget”

June 23, 2026
supreme-court-rejects-falun-gong-lawsuit-accusing-cisco-of-helping-china-persecute-religious-movement

Supreme Court Rejects Falun Gong Lawsuit Accusing Cisco Of Helping China Persecute Religious Movement

June 23, 2026
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version