• About
  • Advertise
  • Privacy & Policy
  • Contact
Vidianews
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    openai’s-adult-mode-would-not-generate-pornographic-audio,-images-or-video

    OpenAI’s adult mode would not generate pornographic audio, images or video

    why-timothee-chalamet-lost-the-oscar-for-best-actor

    Why Timothée Chalamet lost the Oscar for best actor

    timothee-chalamet-appears-to-mock-his-girlfriend’s-ex-after-oscar-loss

    Timothée Chalamet appears to mock his girlfriend’s ex after Oscar loss

    ‘bridesmaids’-star-wendi-mclendon-covey-reveals-why-she-missed-the-oscars-reunion

    ‘Bridesmaids’ Star Wendi McLendon-Covey Reveals Why She Missed the Oscars Reunion

    timothee-chalamet-and-kylie-jenner-at-the-oscars-afterparty-after-a-difficult-night

    Timothée Chalamet and Kylie Jenner at the Oscars afterparty after a difficult night

    10-almost-perfect-animated-movies-every-adult-fan-must-watch-at-least-once

    10 Almost Perfect Animated Movies Every Adult Fan Must Watch At Least Once

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    apple-quietly-launches-airpods-max-2

    Apple quietly launches AirPods Max 2

    2tb-ssd-deal:-best-buy-has-discounted-our-favorite-rugged-portable-ssd-–-but-you-can-save-even-more-at-amazon

    2TB SSD deal: Best Buy has discounted our favorite rugged portable SSD – but you can save even more at Amazon

    tech-leaders-need-a-cloud-reality-check-before-it’s-too-late

    Tech Leaders Need a Cloud Reality Check Before It’s Too Late

    apple-unveils-airpods-max-2-in-surprise-announcement

    Apple unveils AirPods Max 2 in surprise announcement

    it’s-time-to-start-creating-wish-lists:-amazon’s-big-spring-sale-returns-on-march-25

    It’s Time to Start Creating Wish Lists: Amazon’s Big Spring Sale Returns on March 25

    cobol-is-the-asbestos-of-programming-languages

    COBOL is the asbestos of programming languages

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    solar-eclipse-triggers-surge-in-luxury-villa-market-in-mallorca

    Solar eclipse triggers surge in luxury villa market in Mallorca

    5-bible-verses-against-overthinking-and-anxiety

    5 Bible Verses Against Overthinking and Anxiety

    how-lauryn-uses-all-skinny-confidential-beauty-tools-and-products

    How Lauryn Uses All Skinny Confidential Beauty Tools and Products

    5-habits-that-will-make-you-feel-better-in-a-week

    5 habits that will make you feel better in a week

    15-colorful-outfit-ideas-for-women-over-40

    15 Colorful Outfit Ideas for Women Over 40

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    iran-war-unlikely-to-trigger-global-supply-chain-crisis,-says-goldman-sachs

    Iran War Unlikely To Trigger Global Supply Chain Crisis, Says Goldman Sachs

    costco-recalls-popular-meatloaf-meal-kit-due-to-salmonella-contamination-fears-in-26-states

    Costco Recalls Popular Meatloaf Meal Kit Due To Salmonella Contamination Fears In 26 States

    China urges US to correct ‘mistakes’ in trade probes ahead of Paris talks

    trump-signals-possible-beijing-summit-delay-as-us-pressures-china-to-help-reopen-strait-of-hormuz

    Trump signals possible Beijing summit delay as US pressures China to help reopen Strait of Hormuz

    michael-b.-jordan-wins-best-actor-for-‘sinners’

    Michael B. Jordan wins best actor for ‘Sinners’

    oscars-2026:-full-list-of-winners

    Oscars 2026: full list of winners

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    Why I’m bullish on Binance Coin for 2022

    iOS 27 and iPadOS 27 would skip major Liquid Glass changes; to bring iPhone Fold features

    MacBook Neo teardown suggests it could be Apple’s most repairable laptop in several years

    Why I’m bullish on Ether for 2022

    Apple’s foldable model expected to launch as ‘iPhone Ultra’; Leaked price and memory configurations

    Why I’m optimistic about Terra for 2022

No Result
View All Result
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    openai’s-adult-mode-would-not-generate-pornographic-audio,-images-or-video

    OpenAI’s adult mode would not generate pornographic audio, images or video

    why-timothee-chalamet-lost-the-oscar-for-best-actor

    Why Timothée Chalamet lost the Oscar for best actor

    timothee-chalamet-appears-to-mock-his-girlfriend’s-ex-after-oscar-loss

    Timothée Chalamet appears to mock his girlfriend’s ex after Oscar loss

    ‘bridesmaids’-star-wendi-mclendon-covey-reveals-why-she-missed-the-oscars-reunion

    ‘Bridesmaids’ Star Wendi McLendon-Covey Reveals Why She Missed the Oscars Reunion

    timothee-chalamet-and-kylie-jenner-at-the-oscars-afterparty-after-a-difficult-night

    Timothée Chalamet and Kylie Jenner at the Oscars afterparty after a difficult night

    10-almost-perfect-animated-movies-every-adult-fan-must-watch-at-least-once

    10 Almost Perfect Animated Movies Every Adult Fan Must Watch At Least Once

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    apple-quietly-launches-airpods-max-2

    Apple quietly launches AirPods Max 2

    2tb-ssd-deal:-best-buy-has-discounted-our-favorite-rugged-portable-ssd-–-but-you-can-save-even-more-at-amazon

    2TB SSD deal: Best Buy has discounted our favorite rugged portable SSD – but you can save even more at Amazon

    tech-leaders-need-a-cloud-reality-check-before-it’s-too-late

    Tech Leaders Need a Cloud Reality Check Before It’s Too Late

    apple-unveils-airpods-max-2-in-surprise-announcement

    Apple unveils AirPods Max 2 in surprise announcement

    it’s-time-to-start-creating-wish-lists:-amazon’s-big-spring-sale-returns-on-march-25

    It’s Time to Start Creating Wish Lists: Amazon’s Big Spring Sale Returns on March 25

    cobol-is-the-asbestos-of-programming-languages

    COBOL is the asbestos of programming languages

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    solar-eclipse-triggers-surge-in-luxury-villa-market-in-mallorca

    Solar eclipse triggers surge in luxury villa market in Mallorca

    5-bible-verses-against-overthinking-and-anxiety

    5 Bible Verses Against Overthinking and Anxiety

    how-lauryn-uses-all-skinny-confidential-beauty-tools-and-products

    How Lauryn Uses All Skinny Confidential Beauty Tools and Products

    5-habits-that-will-make-you-feel-better-in-a-week

    5 habits that will make you feel better in a week

    15-colorful-outfit-ideas-for-women-over-40

    15 Colorful Outfit Ideas for Women Over 40

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    iran-war-unlikely-to-trigger-global-supply-chain-crisis,-says-goldman-sachs

    Iran War Unlikely To Trigger Global Supply Chain Crisis, Says Goldman Sachs

    costco-recalls-popular-meatloaf-meal-kit-due-to-salmonella-contamination-fears-in-26-states

    Costco Recalls Popular Meatloaf Meal Kit Due To Salmonella Contamination Fears In 26 States

    China urges US to correct ‘mistakes’ in trade probes ahead of Paris talks

    trump-signals-possible-beijing-summit-delay-as-us-pressures-china-to-help-reopen-strait-of-hormuz

    Trump signals possible Beijing summit delay as US pressures China to help reopen Strait of Hormuz

    michael-b.-jordan-wins-best-actor-for-‘sinners’

    Michael B. Jordan wins best actor for ‘Sinners’

    oscars-2026:-full-list-of-winners

    Oscars 2026: full list of winners

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    Why I’m bullish on Binance Coin for 2022

    iOS 27 and iPadOS 27 would skip major Liquid Glass changes; to bring iPhone Fold features

    MacBook Neo teardown suggests it could be Apple’s most repairable laptop in several years

    Why I’m bullish on Ether for 2022

    Apple’s foldable model expected to launch as ‘iPhone Ultra’; Leaked price and memory configurations

    Why I’m optimistic about Terra for 2022

No Result
View All Result
Vidianews
No Result
View All Result
Home General

As AI continues to improve, mathematicians struggle to predict their own future

Julie Bort by Julie Bort
March 16, 2026
in General, World
0
as-ai-continues-to-improve,-mathematicians-struggle-to-predict-their-own-future

As AI continues to improve, mathematicians struggle to predict their own future

0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

In the ongoing campaign by artificial intelligence companies to capture pure mathematics, a new cycle is beginning.

The team behind First Proof, an effort to assess the ability of large language models (LLMs) to contribute to research-level mathematics, has announced its upcoming review. For this second round, which it plans to roll out over the coming months, the team is demanding access and transparency from any AI company wishing to participate.

This occurs against a backdrop of radical change in mathematics research. In just the last few months, the best publicly available models have begun to generate valid proofs of minor theorems that are actually useful to working mathematicians. For some experts, the first round of First Proof was a pivotal moment in this ongoing story.


On supporting science journalism

If you enjoy this article, please consider supporting our award-winning journalism by subscribe. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


“We were very impressed with the performance of the AI ​​models,” says Lauren Williams, a Harvard University mathematician and member of the First Proof team. “The problems we proposed are really at the forefront of what AI models, perhaps in collaboration with experts, can solve.”

First Proof was born from its 11-person team’s eye-opening, if sometimes frustrating, experiences with AI. No pre-existing benchmark seemed sufficient to test LLMs as a mathematician’s assistant. In principle, an LLM could save time by proving smaller “lemmas” – intermediate propositions on a mathematician’s path to developing larger, more interesting theorems. In practice, however, these AI assists have tended to go awry.

So for their initial “experimental” test, the First Proof team chose 10 lemmas from papers members had written but not yet published, then set a one-week deadline for AI companies (and anyone else) to try to prove these propositions using their favorite models.

Groups from OpenAI and Google have published their LLMs’ answers to all the problems. Five of the OpenAI model proofs appeared correct. And Google Deepmind’s agent Aletheia seems to have obtained six (even if experts are not unanimous on the validity of any of these proofs). Comparing the performance of the two models, Williams was surprised to find that each solved several problems that the other could not. “It’s interesting to see that their abilities are different,” she says.

“The performance was better than I expected,” says Daniel Litt, a mathematician at the University of Toronto who is not directly involved in the First Proof effort. In total, no fewer than eight out of ten problems appear to have been at least partially solved by AI. “It’s clear that capabilities have improved very quickly,” says Litt.

A future unclear but full of hope

Litt isn’t afraid of AI’s growing mathematical prowess. “I don’t expect that in five years it will be useless,” he says. “In fact, I expect to do the best work I’ve ever done because I’ll have these incredible tools.” In fact, the results of the first proof inspired him to write an essaywhich has circulated widely among mathematicians in recent weeks. It presents a speculative and optimistic view of the AI-infused future of the field.

For the sake of argument, Litt imagines a hypothetical library generated by superintelligent AIs and containing all possible proofs in the mathematical universe. A simple human mathematician wandering among its innumerable shelves could browse all its volumes but could not create any new proofs himself.

But that doesn’t mean mathematicians would be paralyzed by boredom, Litt says. Far from it. “They would be incredibly excited and get to work right away,” he wrote in the essay. The mathematical universe is so vast, he says, that the joy lies in exploring it, whether reading and digesting a proof or writing a new one. “My job wouldn’t even change at all,” he says. “The job now is to try to figure things out.”

Even if all mathematicians agreed with Litt’s decidedly utopian vision of this thought experiment, the current situation falls far short of this lofty ideal, as evidenced by the first round of First Proof. “Together, the models solved maybe eight of the problems,” he says. “But they also produced thousands and thousands of pages of garbage.”

It turns out that current AIs are often fake but convincing. They will cite a result in the literature but claim it is stronger than it is. Or they’ll bury a crucial error deep in a tedious calculation, where it’s easy to miss. “Students make mistakes, but this is definitely not the case. while trying make mistakes,” Litt says. “Models aren’t very honest.”

This qualitative difference in the types of quantitative errors produced by LLMs can make it very difficult to evaluate their responses. “One of the things we learned from this first round is how difficult it can be to verify the accuracy of the results,” says Mohammed Abouzaid, a member of the First Proof team and a mathematician at Stanford University. “You would almost say, ‘No human who knows what all these words mean would make this mistake!’ » »

For the second round, the team plans to give the task of evaluating each application to mathematicians hired as anonymous evaluators, funded by a combination of grants and donations from AI companies. But with no sign of slowing the massive mathematical assault, a deluge of subtly false proofs written in LLMs could soon overwhelm human resources. “People need to start thinking about it,” Litt said. “Our institutions and the profession are not adapting to what is coming. »

An unexplained gap

The first round seemingly revealed a glaring chasm between public and private efforts. This would seem to challenge the idea that AI usurping human skills would democratize them, for example by expanding the number of people able to contribute meaningfully to the progress of mathematics.

In the team’s internal testing before releasing the first round’s 10 lemmas, even the best publicly available models were only able to prove two. During the week-long testing period, various groups of amateur and professional mathematicians attempted to do better by building “scaffolds,” collaborative networks of LLMs that talked to each other to detect errors. But all these efforts only solved one more problem.

Several different factors could explain why Google and OpenAI managed to solve (at least partially) eight problems compared to the public’s three. Companies could use improved, novel versions of their LLMs or more robust internal scaffolding. Or the answers could rely on undisclosed contributions from human mathematicians. (the Google team published an explanation of its methodology. The team said this approach included “absolutely no human intervention” – the kind of claim that First Proof’s new requirements would verify in the second round.)

That’s what the second round is supposed to solve, Williams says. “This was an experiment,” she says, “to get community feedback to determine how to run a more formal cycle.”

In addition to more robust human judgment, this round will require participants to package models so that the First Proof team can prompt them directly. “If it’s not a public model, then we have to run it,” says Abouzaid, “because otherwise it’s not clear what we’re testing.”

It remains to be seen whether OpenAI and Google will comply, or whether the many other LLM companies and math AI start-ups that were conspicuously absent in the first round will do so.

In the months to come, First Proof and other AI benchmarks could help predict the still-unclear fate of mathematics – a small niche in the scientific world that some of the richest eyes on Earth are suddenly turning to.

“One of our main motivations is to be able to tell young people what the field will look like in a few years,” explains Abouzaid. “And that requires understanding what these systems are actually capable of.”

Related

Julie Bort

Julie Bort

Stay Connected

  • 99 Subscribers
  • Trending
  • Comments
  • Latest
european-markets-in-mixed-territory-after-a-positive-start

European markets in mixed territory after a positive start

January 26, 2026
nascar-driver-denny-hamlin-breaks-silence-after-father-dies-in-house-fire

NASCAR driver Denny Hamlin breaks silence after father dies in house fire

December 31, 2025
fivio-foreign-checks-himself-into-a-$10,000-rehab-center-to-get-his-mind-straight

Fivio Foreign checks himself into a $10,000 rehab center to get his mind straight

December 31, 2025
tcl-lost-a-lawsuit-claiming-its-qled-tvs-are-not

TCL lost a lawsuit claiming its QLED TVs are not

March 13, 2026
hansmaker-presents-the-d1-ultra:-a-dual-laser-engraver-designed-for-each-material-–-techenger

Hansmaker presents the D1 Ultra: a dual laser engraver designed for each material – Techenger

0
nascar-driver-denny-hamlin-breaks-silence-after-father-dies-in-house-fire

NASCAR driver Denny Hamlin breaks silence after father dies in house fire

0
fivio-foreign-checks-himself-into-a-$10,000-rehab-center-to-get-his-mind-straight

Fivio Foreign checks himself into a $10,000 rehab center to get his mind straight

0
david-beckham-leaves-brooklyn-for-his-2025-instagram-tribute-amid-family-feud

David Beckham leaves Brooklyn for his 2025 Instagram tribute amid family feud

0
smartwatch-data-can-be-used-to-assess-early-diabetes-risk

Smartwatch data can be used to assess early diabetes risk

March 16, 2026
extreme-heat-reduces-the-time-people-can-safely-be-active-outdoors

Extreme heat reduces the time people can safely be active outdoors

March 16, 2026
why-is-the-supreme-court-treating-trump-like-an-“ordinary”-president?

Why is the Supreme Court treating Trump like an “ordinary” president?

March 16, 2026
men’s-march-madness-2025-26-odds:-how-tournament-favorites-have-seen-title-chances-change

Men’s March Madness 2025-26 odds: How tournament favorites have seen title chances change

March 16, 2026

Recent News

smartwatch-data-can-be-used-to-assess-early-diabetes-risk

Smartwatch data can be used to assess early diabetes risk

March 16, 2026
extreme-heat-reduces-the-time-people-can-safely-be-active-outdoors

Extreme heat reduces the time people can safely be active outdoors

March 16, 2026
why-is-the-supreme-court-treating-trump-like-an-“ordinary”-president?

Why is the Supreme Court treating Trump like an “ordinary” president?

March 16, 2026
men’s-march-madness-2025-26-odds:-how-tournament-favorites-have-seen-title-chances-change

Men’s March Madness 2025-26 odds: How tournament favorites have seen title chances change

March 16, 2026
Vidianews

Trusted news coverage delivering accurate reporting, breaking headlines, and insightful analysis on global events, business, politics, and tech.

Follow Us

Browse by Category

  • Business
  • Entertainment
  • Faith
  • Gadget
  • Gaming
  • General
  • Health
  • Lifestyle
  • Movie
  • News
  • Politics
  • Review
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

smartwatch-data-can-be-used-to-assess-early-diabetes-risk

Smartwatch data can be used to assess early diabetes risk

March 16, 2026
extreme-heat-reduces-the-time-people-can-safely-be-active-outdoors

Extreme heat reduces the time people can safely be active outdoors

March 16, 2026
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version