• About
  • Advertise
  • Privacy & Policy
  • Contact
Vidianews
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    bytedance-reportedly-pauses-global-rollout-of-its-new-ai-video-generator

    ByteDance reportedly pauses global rollout of its new AI video generator

    what-kandi-burruss-told-riley-not-to-let-happen-on-‘next-gen:-nyc’

    What Kandi Burruss Told Riley Not to Let Happen on ‘Next Gen: NYC’

    oprah-responds-to-ozempic’s-claims-after-paris-fashion-week

    Oprah responds to Ozempic’s claims after Paris Fashion Week

    oprah-winfrey-applauds-trolls-during-paris-fashion-week-viral-walk

    Oprah Winfrey applauds Trolls during Paris Fashion Week viral walk

    georgia-teen-was-driving-carefully-when-he-killed-his-teacher,-lawyer-says

    Georgia teen was driving carefully when he killed his teacher, lawyer says

    steam-players-have-24-hours-to-claim-and-keep-a-classic-free-game

    Steam players have 24 hours to claim and keep a classic free game

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    nyt-strands-today-–-my-tips-and-answers-for-march-16-(#743)

    NYT Strands today – my tips and answers for March 16 (#743)

    i-tried-chatgpt’s-new-visual-math-explanations-and-now-the-equations-add-up

    I tried ChatGPT’s new visual math explanations and now the equations add up

    Peacock hopes an Andy Cohen avatar will keep you hooked on reality TV

    “marshals”:-​​when-will-episode-3-air-on-paramount-plus?

    “Marshals”: ​​When will episode 3 air on Paramount Plus?

    our-favorite-red-light-hair-growth-device-is-on-sale-now

    Our favorite red light hair growth device is on sale now

    us-military-announces-anduril-contract-worth-up-to-$20-billion-|-techcrunch

    US military announces Anduril contract worth up to $20 billion | TechCrunch

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    15-colorful-outfit-ideas-for-women-over-40

    15 Colorful Outfit Ideas for Women Over 40

    encouragement-for-the-mom-who-needs-a-sweet-friend

    Encouragement for the mom who needs a sweet friend

    from-saying-yes-to-everything-to-selective-living-with-kornelija-collins

    From Saying Yes to Everything to Selective Living with Kornelija Collins

    how-to-design-a-guest-bedroom-so-everyone-feels-at-home

    How to design a guest bedroom so everyone feels at home

    15-beautiful-abstract-summer-nail-design-ideas-to-copy

    15 Beautiful Abstract Summer Nail Design Ideas to Copy

    the-anti-route-safari

    The anti-route safari

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    dolly-parton-opens-dollywood’s-41st-season,-promises-more-projects-to-come:-‘i’m-not-almost-done’

    Dolly Parton Opens Dollywood’s 41st Season, Promises More Projects To Come: ‘I’m Not Almost Done’

    spring-break-flyers-warn-of-massive-tsa-lines-as-closure-drains-airport-staff

    Spring Break Flyers Warn Of Massive TSA Lines As Closure Drains Airport Staff

    Iranian strikes and Hezbollah rockets make normal life in Israel ‘simply impossible’

    doj-to-appeal-block-on-fed-subpoenas-in-jerome-powell-criminal-investigation

    DOJ to appeal block on Fed subpoenas in Jerome Powell criminal investigation

    trump-says-iran-ready-to-negotiate-ceasefire,-but-not-ready-to-make-deal

    Trump says Iran ready to negotiate ceasefire, but not ready to make deal

    chess:-the-content-creators-who-are-bringing-the-ancient-game-into-the-digital-age.

    Chess: the content creators who are bringing the ancient game into the digital age.

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    MacBook Neo teardown suggests it could be Apple’s most repairable laptop in several years

    Why I’m bullish on Ether for 2022

    Apple’s foldable model expected to launch as ‘iPhone Ultra’; Leaked price and memory configurations

    Why I’m optimistic about Terra for 2022

    iPhone Fold would feature an iPad-style UI and support split-screen apps

    Why I’m bullish on Polkadot for 2022

No Result
View All Result
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    bytedance-reportedly-pauses-global-rollout-of-its-new-ai-video-generator

    ByteDance reportedly pauses global rollout of its new AI video generator

    what-kandi-burruss-told-riley-not-to-let-happen-on-‘next-gen:-nyc’

    What Kandi Burruss Told Riley Not to Let Happen on ‘Next Gen: NYC’

    oprah-responds-to-ozempic’s-claims-after-paris-fashion-week

    Oprah responds to Ozempic’s claims after Paris Fashion Week

    oprah-winfrey-applauds-trolls-during-paris-fashion-week-viral-walk

    Oprah Winfrey applauds Trolls during Paris Fashion Week viral walk

    georgia-teen-was-driving-carefully-when-he-killed-his-teacher,-lawyer-says

    Georgia teen was driving carefully when he killed his teacher, lawyer says

    steam-players-have-24-hours-to-claim-and-keep-a-classic-free-game

    Steam players have 24 hours to claim and keep a classic free game

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    nyt-strands-today-–-my-tips-and-answers-for-march-16-(#743)

    NYT Strands today – my tips and answers for March 16 (#743)

    i-tried-chatgpt’s-new-visual-math-explanations-and-now-the-equations-add-up

    I tried ChatGPT’s new visual math explanations and now the equations add up

    Peacock hopes an Andy Cohen avatar will keep you hooked on reality TV

    “marshals”:-​​when-will-episode-3-air-on-paramount-plus?

    “Marshals”: ​​When will episode 3 air on Paramount Plus?

    our-favorite-red-light-hair-growth-device-is-on-sale-now

    Our favorite red light hair growth device is on sale now

    us-military-announces-anduril-contract-worth-up-to-$20-billion-|-techcrunch

    US military announces Anduril contract worth up to $20 billion | TechCrunch

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    15-colorful-outfit-ideas-for-women-over-40

    15 Colorful Outfit Ideas for Women Over 40

    encouragement-for-the-mom-who-needs-a-sweet-friend

    Encouragement for the mom who needs a sweet friend

    from-saying-yes-to-everything-to-selective-living-with-kornelija-collins

    From Saying Yes to Everything to Selective Living with Kornelija Collins

    how-to-design-a-guest-bedroom-so-everyone-feels-at-home

    How to design a guest bedroom so everyone feels at home

    15-beautiful-abstract-summer-nail-design-ideas-to-copy

    15 Beautiful Abstract Summer Nail Design Ideas to Copy

    the-anti-route-safari

    The anti-route safari

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    dolly-parton-opens-dollywood’s-41st-season,-promises-more-projects-to-come:-‘i’m-not-almost-done’

    Dolly Parton Opens Dollywood’s 41st Season, Promises More Projects To Come: ‘I’m Not Almost Done’

    spring-break-flyers-warn-of-massive-tsa-lines-as-closure-drains-airport-staff

    Spring Break Flyers Warn Of Massive TSA Lines As Closure Drains Airport Staff

    Iranian strikes and Hezbollah rockets make normal life in Israel ‘simply impossible’

    doj-to-appeal-block-on-fed-subpoenas-in-jerome-powell-criminal-investigation

    DOJ to appeal block on Fed subpoenas in Jerome Powell criminal investigation

    trump-says-iran-ready-to-negotiate-ceasefire,-but-not-ready-to-make-deal

    Trump says Iran ready to negotiate ceasefire, but not ready to make deal

    chess:-the-content-creators-who-are-bringing-the-ancient-game-into-the-digital-age.

    Chess: the content creators who are bringing the ancient game into the digital age.

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    MacBook Neo teardown suggests it could be Apple’s most repairable laptop in several years

    Why I’m bullish on Ether for 2022

    Apple’s foldable model expected to launch as ‘iPhone Ultra’; Leaked price and memory configurations

    Why I’m optimistic about Terra for 2022

    iPhone Fold would feature an iPad-style UI and support split-screen apps

    Why I’m bullish on Polkadot for 2022

No Result
View All Result
Vidianews
No Result
View All Result
Home General

The first proof is AI

Julie Bort by Julie Bort
February 15, 2026
in General, World
0
the-first-proof-is-ai

The first proof is AI

0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

February 14, 2026

4 minutes of reading

Google logo Add us on GoogleAdd science

The experts gave the AI ​​10 math problems to solve in a week. OpenAI, researchers and amateurs all gave the best of themselves

By Joseph Howlett edited by Claire Cameron

Black and white photo of a room full of teenage students hunched over their desks taking an exam.

Interim Archives / Contributor via Getty Images

The verdict seems to be in: artificial intelligence is not about to replace mathematicians.

This is the immediate conclusion of the challenge of the “First Proof”— perhaps the most robust test yet of the ability of large language models (LLMs) to perform mathematical searches. Determined by 11 top mathematicians on February 5, the test results were released early on Valentine’s Day morning. It’s too early to say with certainty how many of the 10 math problems included in the challenge were solved by AIs without human help. But one thing is clear: none of the LLMs managed to solve them all.

The mathematicians behind First Proof introduced the 10 “lemmas” of AI, a mathematical term for minor theorems that point the way to a larger outcome. These problems are the stock-in-trade of the working mathematician, the kind of mini-problems that might be assigned to a talented graduate student. The mathematicians were aiming for problems that would require some originality to solve, not just a mix of standard techniques, according to Mohammed Abouzaid, a professor of mathematics at Stanford University and a member of the First Proof team.


On supporting science journalism

If you enjoy this article, please consider supporting our award-winning journalism by subscribe. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


The challenge, while highlighting the limitations of AI, also highlights a burgeoning subculture passionate about AI within the mathematics community. Online discussion forums and social media accounts dedicated to mathematics have been flooded with purported evidence from top mathematicians and rogue students. And it highlighted how AI startups, including ChatGPT creator OpenAI, are taking on the challenge of teaching an LLM math.

“We did not expect such activity,” explains Abouzaid. “We didn’t expect AI companies to take this seriously and put so much work into it.”

The First Proof team revealed the solutions to the 10 challenges early Saturday, and job about their own experiences trying to get LLMs to solve problems. They found that AIs could provide reliable proofs for every problem, but only two were correct: those for the ninth and tenth problems. And an almost identical proof to the ninth problem turned out to already exist. The first problem was also “contaminated” – a sketch of a proof was archived on the website of its author, team member and 2014 Fields Medal winner Martin Hairer – but LLMs still failed to fill in the gaps.

The style of proof proposed by the LLMs was particularly surprising, says Abouzaid. “The correct solutions I have seen in AI systems have the flavor of 19th century mathematics,” he says. “But we are trying to build 21st century mathematics.”

Outside submissions don’t seem to fare much better. Some submissions appeared to involve varying degrees of human input, with several appearing to be the result of week-long dialogues vetted by mathematicians. Above all, the Rules of first evidence prohibit human mathematical input or prompting.

“Once there are humans involved, how can we judge the extent to which there is human and AI?” says Lauren Williams, the Dwight Parker Robinson Professor of Mathematics at Harvard University and one of the mathematicians who created First Proof.

OpenAI released its work on Saturday, the result of a week-long sprint using its latest in-house AI models working with “expert feedback” from human mathematicians. The company’s chief scientist, Jakub Pachocki, said in a statement social media post that they believe that six of their ten solutions “have a good chance of being correct.” Mathematicians have already pointed out potential holes in at least one of these six.

Aside from the amount of human assistance the AIs received, the vast majority of submissions appear to consist of very convincing nonsense. Even before the challenge ended, a number of so-called solutions that initially seemed credible were already being called into question by experts.

Submissions will take days for experts to properly review. And judging whether a piece of evidence is truly “original” is even more difficult than judging whether it is correct. “Nothing in mathematics is completely unprecedented,” says Daniel Litt, a mathematician at the University of Toronto who was not part of the First Proof team.

“We view this as an experiment. Our goal was to get feedback,” says Abouzaid. The team writes that it plans a second round with stricter controls and that more details will be released on March 14.

For some mathematicians who have followed advances in AI, the mixed results match their expectations. “I expected maybe two or three unambiguously correct solutions from publicly available models,” says Litt. “Ten would have been very surprising to me.”

Yet even getting a few valid solutions to research problems from an AI would probably have been impossible just a few months ago. “I’ve already heard from colleagues that they are in shock,” says Scott Armstrong, a mathematician at Sorbonne University in France. “These tools are going to change math, and it’s happening now.”

But for those who closely follow AI’s achievements, it’s not a great achievement.

“The models seem to have struggled,” says Kevin Barreto, an undergraduate at the University of Cambridge, who was not part of the First Proof team. He recently used AI to solve one of Erdős’ problemsa number of challenges posed by the Hungarian mathematician Paul Erdős. “To be honest, yes, I’m a little disappointed.”

It’s time to defend science

If you enjoyed this article, I would like to ask for your support. Scientific American has been defending science and industry for 180 years, and we are currently experiencing perhaps the most critical moment in these two centuries of history.

I was a Scientific American subscriber since the age of 12, and it helped shape the way I see the world. SciAm always educates and delights me, and inspires a sense of respect for our vast and magnificent universe. I hope this is the case for you too.

If you subscribe to Scientific Americanyou help ensure our coverage centers on meaningful research and discoveries; that we have the resources to account for decisions that threaten laboratories across the United States; and that we support budding and working scientists at a time when the value of science itself too often goes unrecognized.

In exchange, you receive essential information, captivating podcastsbrilliant infographics, newsletters not to be missedunmissable videos, stimulating gamesand the best writings and reports from the scientific world. You can even give someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you will support us in this mission.

Related

Julie Bort

Julie Bort

Stay Connected

  • 99 Subscribers
  • Trending
  • Comments
  • Latest
european-markets-in-mixed-territory-after-a-positive-start

European markets in mixed territory after a positive start

January 26, 2026
nascar-driver-denny-hamlin-breaks-silence-after-father-dies-in-house-fire

NASCAR driver Denny Hamlin breaks silence after father dies in house fire

December 31, 2025
fivio-foreign-checks-himself-into-a-$10,000-rehab-center-to-get-his-mind-straight

Fivio Foreign checks himself into a $10,000 rehab center to get his mind straight

December 31, 2025
tcl-lost-a-lawsuit-claiming-its-qled-tvs-are-not

TCL lost a lawsuit claiming its QLED TVs are not

March 13, 2026
hansmaker-presents-the-d1-ultra:-a-dual-laser-engraver-designed-for-each-material-–-techenger

Hansmaker presents the D1 Ultra: a dual laser engraver designed for each material – Techenger

0
nascar-driver-denny-hamlin-breaks-silence-after-father-dies-in-house-fire

NASCAR driver Denny Hamlin breaks silence after father dies in house fire

0
fivio-foreign-checks-himself-into-a-$10,000-rehab-center-to-get-his-mind-straight

Fivio Foreign checks himself into a $10,000 rehab center to get his mind straight

0
david-beckham-leaves-brooklyn-for-his-2025-instagram-tribute-amid-family-feud

David Beckham leaves Brooklyn for his 2025 Instagram tribute amid family feud

0
2026-ncaa-tournament-projections:-florida-still-no.-1-seed?-miami-(oh)-in-or-out?

2026 NCAA Tournament Projections: Florida Still No. 1 Seed? Miami (OH) in or out?

March 15, 2026
dolly-parton-opens-dollywood’s-41st-season,-promises-more-projects-to-come:-‘i’m-not-almost-done’

Dolly Parton Opens Dollywood’s 41st Season, Promises More Projects To Come: ‘I’m Not Almost Done’

March 15, 2026
scientists-revive-brain-activity-in-frozen-mice-for-the-first-time

Scientists revive brain activity in frozen mice for the first time

March 15, 2026
spaceflight-enhances-the-ability-of-viruses-to-infect-bacteria

Spaceflight enhances the ability of viruses to infect bacteria

March 15, 2026

Recent News

2026-ncaa-tournament-projections:-florida-still-no.-1-seed?-miami-(oh)-in-or-out?

2026 NCAA Tournament Projections: Florida Still No. 1 Seed? Miami (OH) in or out?

March 15, 2026
dolly-parton-opens-dollywood’s-41st-season,-promises-more-projects-to-come:-‘i’m-not-almost-done’

Dolly Parton Opens Dollywood’s 41st Season, Promises More Projects To Come: ‘I’m Not Almost Done’

March 15, 2026
scientists-revive-brain-activity-in-frozen-mice-for-the-first-time

Scientists revive brain activity in frozen mice for the first time

March 15, 2026
spaceflight-enhances-the-ability-of-viruses-to-infect-bacteria

Spaceflight enhances the ability of viruses to infect bacteria

March 15, 2026
Vidianews

Trusted news coverage delivering accurate reporting, breaking headlines, and insightful analysis on global events, business, politics, and tech.

Follow Us

Browse by Category

  • Business
  • Entertainment
  • Faith
  • Gadget
  • Gaming
  • General
  • Health
  • Lifestyle
  • Movie
  • News
  • Politics
  • Review
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

2026-ncaa-tournament-projections:-florida-still-no.-1-seed?-miami-(oh)-in-or-out?

2026 NCAA Tournament Projections: Florida Still No. 1 Seed? Miami (OH) in or out?

March 15, 2026
dolly-parton-opens-dollywood’s-41st-season,-promises-more-projects-to-come:-‘i’m-not-almost-done’

Dolly Parton Opens Dollywood’s 41st Season, Promises More Projects To Come: ‘I’m Not Almost Done’

March 15, 2026
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version