• About
  • Advertise
  • Privacy & Policy
  • Contact
Vidianews
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    marathon-season-2’s-big-free-debut-plagued-by-server-issues

    Marathon Season 2’s Big Free Debut Plagued By Server Issues

    kristin-cavallari-recalls-the-strange-way-‘a-very-famous-man’-tried-to-date-her

    Kristin Cavallari Recalls the Strange Way ‘A Very Famous Man’ Tried to Date Her

    why-larsa-pippen-told-son-preston-not-to-trust-‘calabasas-confidential’-co-stars

    Why Larsa Pippen Told Son Preston Not to Trust ‘Calabasas Confidential’ Co-Stars

    cbs-news-fires-scott-pelley-after-conflict-with-new-’60-minutes’-producer

    CBS News fires Scott Pelley after conflict with new ’60 Minutes’ producer

    blake-lively’s-legal-strategy-draws-criticism-from-survivors-act-creator

    Blake Lively’s legal strategy draws criticism from Survivors Act creator

    lego-announces-12-pokemon-smart-play-sets

    Lego announces 12 Pokémon Smart Play sets

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    cyberdecks-are-having-a-moment,-rejecting-big-tech-surveillance-with-style-and-substance-|-techcrunch

    Cyberdecks are having a moment, rejecting big tech surveillance with style and substance | TechCrunch

    amazon

    Amazon

    “a-sensational-upgrade-with-bigger-bass,-improved-clarity-and-even-more-power”:-the-jbl-xtreme-5-is-a-five-star-bluetooth-speaker-and-blew-me-away-with-its-almighty-sound-that-is-definitely-worth-the-price.

    “A sensational upgrade with bigger bass, improved clarity and even more power”: The JBL Xtreme 5 is a five-star Bluetooth speaker and blew me away with its almighty sound that is definitely worth the price.

    today’s-nyt-connections:-sports-editing-tips,-answers-for-june-3-#618

    Today’s NYT Connections: Sports Editing Tips, Answers for June 3 #618

    tips,-answers-and-help-from-today’s-nyt-strands-for-june-3-#822-–-cnet

    Tips, answers and help from today’s NYT Strands for June 3 #822 – CNET

    palantir-contracts-have-become-‘an-unacceptable-point-of-weakness’,-uk-politicians-warn

    Palantir contracts have become ‘an unacceptable point of weakness’, UK politicians warn

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    home-buying-timeline:-what-happens-at-each-step-|-live-better

    Home Buying Timeline: What Happens At Each Step | Live Better

    yasuni-national-park-in-ecuador:-the-accessible-amazon-experience

    Yasuní National Park in Ecuador: the accessible Amazon experience

    servier-launches-into-muscular-dystrophy,-pays-$1.5-billion-for-edgewise-therapeutics-assets-–-medcity-news

    Servier launches into muscular dystrophy, pays $1.5 billion for Edgewise Therapeutics assets – MedCity News

    june-2026-scripture-writing-challenge

    June 2026 Scripture Writing Challenge

    how-to-dress-for-new-york:-get-lauryn’s-elevated-new-york-look

    How to Dress for New York: Get Lauryn’s Elevated New York Look

    30-simple-delights-to-add-to-your-june-calendar

    30 Simple Delights to Add to Your June Calendar

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    nissan-recalls-more-than-51,000-suvs-after-software-fault-causes-dashboard-screens-to-crash

    Nissan Recalls More Than 51,000 SUVs After Software Fault Causes Dashboard Screens To Crash

    Vedanta, Hindustan Zinc shares fall after metals giant confirms CEO’s office visits

    J.

    amazon-announces-dates-for-this-year’s-prime-day-sales-event

    Amazon Announces Dates For This Year’s Prime Day Sales Event

    “Vitiforestry”: French winegrowers plant trees to fight climate change

    biotech-abivax-buyout-target-falls-more-than-30%-after-bowel-disease-drug-trial-update

    Biotech Abivax Buyout Target Falls More Than 30% After Bowel Disease Drug Trial Update

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    Facebook’s Dream Hire, Former British Deputy Prime Minister Nick Clegg, Gets Off to a Bad Start

    The iPhone Ultra is expected to launch in a white color; May feature vapor chamber cooling

    Elon Musk scaled back his dreams of ending climate change

    Apple’s Ray-Ban Meta Rivaling smart glasses reportedly delayed until next year; Vision Air will launch in 2029

    US-China trade war turns into tech war

    Oura Ring 4 Review: An Always-On Solution for Effective Health Monitoring

No Result
View All Result
  • Home
  • Entertainment
    • All
    • Gaming
    • Movie
    marathon-season-2’s-big-free-debut-plagued-by-server-issues

    Marathon Season 2’s Big Free Debut Plagued By Server Issues

    kristin-cavallari-recalls-the-strange-way-‘a-very-famous-man’-tried-to-date-her

    Kristin Cavallari Recalls the Strange Way ‘A Very Famous Man’ Tried to Date Her

    why-larsa-pippen-told-son-preston-not-to-trust-‘calabasas-confidential’-co-stars

    Why Larsa Pippen Told Son Preston Not to Trust ‘Calabasas Confidential’ Co-Stars

    cbs-news-fires-scott-pelley-after-conflict-with-new-’60-minutes’-producer

    CBS News fires Scott Pelley after conflict with new ’60 Minutes’ producer

    blake-lively’s-legal-strategy-draws-criticism-from-survivors-act-creator

    Blake Lively’s legal strategy draws criticism from Survivors Act creator

    lego-announces-12-pokemon-smart-play-sets

    Lego announces 12 Pokémon Smart Play sets

  • Sports
  • Tech
    • All
    • Gadget
    • Startup
    cyberdecks-are-having-a-moment,-rejecting-big-tech-surveillance-with-style-and-substance-|-techcrunch

    Cyberdecks are having a moment, rejecting big tech surveillance with style and substance | TechCrunch

    amazon

    Amazon

    “a-sensational-upgrade-with-bigger-bass,-improved-clarity-and-even-more-power”:-the-jbl-xtreme-5-is-a-five-star-bluetooth-speaker-and-blew-me-away-with-its-almighty-sound-that-is-definitely-worth-the-price.

    “A sensational upgrade with bigger bass, improved clarity and even more power”: The JBL Xtreme 5 is a five-star Bluetooth speaker and blew me away with its almighty sound that is definitely worth the price.

    today’s-nyt-connections:-sports-editing-tips,-answers-for-june-3-#618

    Today’s NYT Connections: Sports Editing Tips, Answers for June 3 #618

    tips,-answers-and-help-from-today’s-nyt-strands-for-june-3-#822-–-cnet

    Tips, answers and help from today’s NYT Strands for June 3 #822 – CNET

    palantir-contracts-have-become-‘an-unacceptable-point-of-weakness’,-uk-politicians-warn

    Palantir contracts have become ‘an unacceptable point of weakness’, UK politicians warn

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Lifestyle
    • All
    • Faith
    • Health
    • Travel
    home-buying-timeline:-what-happens-at-each-step-|-live-better

    Home Buying Timeline: What Happens At Each Step | Live Better

    yasuni-national-park-in-ecuador:-the-accessible-amazon-experience

    Yasuní National Park in Ecuador: the accessible Amazon experience

    servier-launches-into-muscular-dystrophy,-pays-$1.5-billion-for-edgewise-therapeutics-assets-–-medcity-news

    Servier launches into muscular dystrophy, pays $1.5 billion for Edgewise Therapeutics assets – MedCity News

    june-2026-scripture-writing-challenge

    June 2026 Scripture Writing Challenge

    how-to-dress-for-new-york:-get-lauryn’s-elevated-new-york-look

    How to Dress for New York: Get Lauryn’s Elevated New York Look

    30-simple-delights-to-add-to-your-june-calendar

    30 Simple Delights to Add to Your June Calendar

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • News
    • All
    • Business
    • Science
    nissan-recalls-more-than-51,000-suvs-after-software-fault-causes-dashboard-screens-to-crash

    Nissan Recalls More Than 51,000 SUVs After Software Fault Causes Dashboard Screens To Crash

    Vedanta, Hindustan Zinc shares fall after metals giant confirms CEO’s office visits

    J.

    amazon-announces-dates-for-this-year’s-prime-day-sales-event

    Amazon Announces Dates For This Year’s Prime Day Sales Event

    “Vitiforestry”: French winegrowers plant trees to fight climate change

    biotech-abivax-buyout-target-falls-more-than-30%-after-bowel-disease-drug-trial-update

    Biotech Abivax Buyout Target Falls More Than 30% After Bowel Disease Drug Trial Update

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Business
  • Politics
  • World
  • Review

    Facebook’s Dream Hire, Former British Deputy Prime Minister Nick Clegg, Gets Off to a Bad Start

    The iPhone Ultra is expected to launch in a white color; May feature vapor chamber cooling

    Elon Musk scaled back his dreams of ending climate change

    Apple’s Ray-Ban Meta Rivaling smart glasses reportedly delayed until next year; Vision Air will launch in 2029

    US-China trade war turns into tech war

    Oura Ring 4 Review: An Always-On Solution for Effective Health Monitoring

No Result
View All Result
Vidianews
No Result
View All Result
Home General

The first proof is AI

Julie Bort by Julie Bort
February 15, 2026
in General, World
0
the-first-proof-is-ai

The first proof is AI

0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

February 14, 2026

4 minutes of reading

Google logo Add us on GoogleAdd science

The experts gave the AI ​​10 math problems to solve in a week. OpenAI, researchers and amateurs all gave the best of themselves

By Joseph Howlett edited by Claire Cameron

Black and white photo of a room full of teenage students hunched over their desks taking an exam.

Interim Archives / Contributor via Getty Images

The verdict seems to be in: artificial intelligence is not about to replace mathematicians.

This is the immediate conclusion of the challenge of the “First Proof”— perhaps the most robust test yet of the ability of large language models (LLMs) to perform mathematical searches. Determined by 11 top mathematicians on February 5, the test results were released early on Valentine’s Day morning. It’s too early to say with certainty how many of the 10 math problems included in the challenge were solved by AIs without human help. But one thing is clear: none of the LLMs managed to solve them all.

The mathematicians behind First Proof introduced the 10 “lemmas” of AI, a mathematical term for minor theorems that point the way to a larger outcome. These problems are the stock-in-trade of the working mathematician, the kind of mini-problems that might be assigned to a talented graduate student. The mathematicians were aiming for problems that would require some originality to solve, not just a mix of standard techniques, according to Mohammed Abouzaid, a professor of mathematics at Stanford University and a member of the First Proof team.


On supporting science journalism

If you enjoy this article, please consider supporting our award-winning journalism by subscribe. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


The challenge, while highlighting the limitations of AI, also highlights a burgeoning subculture passionate about AI within the mathematics community. Online discussion forums and social media accounts dedicated to mathematics have been flooded with purported evidence from top mathematicians and rogue students. And it highlighted how AI startups, including ChatGPT creator OpenAI, are taking on the challenge of teaching an LLM math.

“We did not expect such activity,” explains Abouzaid. “We didn’t expect AI companies to take this seriously and put so much work into it.”

The First Proof team revealed the solutions to the 10 challenges early Saturday, and job about their own experiences trying to get LLMs to solve problems. They found that AIs could provide reliable proofs for every problem, but only two were correct: those for the ninth and tenth problems. And an almost identical proof to the ninth problem turned out to already exist. The first problem was also “contaminated” – a sketch of a proof was archived on the website of its author, team member and 2014 Fields Medal winner Martin Hairer – but LLMs still failed to fill in the gaps.

The style of proof proposed by the LLMs was particularly surprising, says Abouzaid. “The correct solutions I have seen in AI systems have the flavor of 19th century mathematics,” he says. “But we are trying to build 21st century mathematics.”

Outside submissions don’t seem to fare much better. Some submissions appeared to involve varying degrees of human input, with several appearing to be the result of week-long dialogues vetted by mathematicians. Above all, the Rules of first evidence prohibit human mathematical input or prompting.

“Once there are humans involved, how can we judge the extent to which there is human and AI?” says Lauren Williams, the Dwight Parker Robinson Professor of Mathematics at Harvard University and one of the mathematicians who created First Proof.

OpenAI released its work on Saturday, the result of a week-long sprint using its latest in-house AI models working with “expert feedback” from human mathematicians. The company’s chief scientist, Jakub Pachocki, said in a statement social media post that they believe that six of their ten solutions “have a good chance of being correct.” Mathematicians have already pointed out potential holes in at least one of these six.

Aside from the amount of human assistance the AIs received, the vast majority of submissions appear to consist of very convincing nonsense. Even before the challenge ended, a number of so-called solutions that initially seemed credible were already being called into question by experts.

Submissions will take days for experts to properly review. And judging whether a piece of evidence is truly “original” is even more difficult than judging whether it is correct. “Nothing in mathematics is completely unprecedented,” says Daniel Litt, a mathematician at the University of Toronto who was not part of the First Proof team.

“We view this as an experiment. Our goal was to get feedback,” says Abouzaid. The team writes that it plans a second round with stricter controls and that more details will be released on March 14.

For some mathematicians who have followed advances in AI, the mixed results match their expectations. “I expected maybe two or three unambiguously correct solutions from publicly available models,” says Litt. “Ten would have been very surprising to me.”

Yet even getting a few valid solutions to research problems from an AI would probably have been impossible just a few months ago. “I’ve already heard from colleagues that they are in shock,” says Scott Armstrong, a mathematician at Sorbonne University in France. “These tools are going to change math, and it’s happening now.”

But for those who closely follow AI’s achievements, it’s not a great achievement.

“The models seem to have struggled,” says Kevin Barreto, an undergraduate at the University of Cambridge, who was not part of the First Proof team. He recently used AI to solve one of Erdős’ problemsa number of challenges posed by the Hungarian mathematician Paul Erdős. “To be honest, yes, I’m a little disappointed.”

It’s time to defend science

If you enjoyed this article, I would like to ask for your support. Scientific American has been defending science and industry for 180 years, and we are currently experiencing perhaps the most critical moment in these two centuries of history.

I was a Scientific American subscriber since the age of 12, and it helped shape the way I see the world. SciAm always educates and delights me, and inspires a sense of respect for our vast and magnificent universe. I hope this is the case for you too.

If you subscribe to Scientific Americanyou help ensure our coverage centers on meaningful research and discoveries; that we have the resources to account for decisions that threaten laboratories across the United States; and that we support budding and working scientists at a time when the value of science itself too often goes unrecognized.

In exchange, you receive essential information, captivating podcastsbrilliant infographics, newsletters not to be missedunmissable videos, stimulating gamesand the best writings and reports from the scientific world. You can even give someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you will support us in this mission.

Related

Julie Bort

Julie Bort

Stay Connected

  • 99 Subscribers
  • Trending
  • Comments
  • Latest
european-markets-in-mixed-territory-after-a-positive-start

European markets in mixed territory after a positive start

January 26, 2026
12-sweet-feminine-aesthetic-outfits-for-the-summer-season

12 Sweet Feminine Aesthetic Outfits for the Summer Season

March 13, 2026
how-to-remove-blood-from-clothes:-what-actually-works-|-live-better

How To Remove Blood From Clothes: What Actually Works | Live Better

April 17, 2026
how-to-remove-grease-from-clothes:-4-tested-methods-|-live-better

How To Remove Grease From Clothes: 4 Tested Methods | Live Better

April 18, 2026
hansmaker-presents-the-d1-ultra:-a-dual-laser-engraver-designed-for-each-material-–-techenger

Hansmaker presents the D1 Ultra: a dual laser engraver designed for each material – Techenger

0
nascar-driver-denny-hamlin-breaks-silence-after-father-dies-in-house-fire

NASCAR driver Denny Hamlin breaks silence after father dies in house fire

0
fivio-foreign-checks-himself-into-a-$10,000-rehab-center-to-get-his-mind-straight

Fivio Foreign checks himself into a $10,000 rehab center to get his mind straight

0
david-beckham-leaves-brooklyn-for-his-2025-instagram-tribute-amid-family-feud

David Beckham leaves Brooklyn for his 2025 Instagram tribute amid family feud

0
nissan-recalls-more-than-51,000-suvs-after-software-fault-causes-dashboard-screens-to-crash

Nissan Recalls More Than 51,000 SUVs After Software Fault Causes Dashboard Screens To Crash

June 3, 2026
home-buying-timeline:-what-happens-at-each-step-|-live-better

Home Buying Timeline: What Happens At Each Step | Live Better

June 3, 2026

Vedanta, Hindustan Zinc shares fall after metals giant confirms CEO’s office visits

June 3, 2026

J.

June 3, 2026

Recent News

nissan-recalls-more-than-51,000-suvs-after-software-fault-causes-dashboard-screens-to-crash

Nissan Recalls More Than 51,000 SUVs After Software Fault Causes Dashboard Screens To Crash

June 3, 2026
home-buying-timeline:-what-happens-at-each-step-|-live-better

Home Buying Timeline: What Happens At Each Step | Live Better

June 3, 2026

Vedanta, Hindustan Zinc shares fall after metals giant confirms CEO’s office visits

June 3, 2026

J.

June 3, 2026
Vidianews

Trusted news coverage delivering accurate reporting, breaking headlines, and insightful analysis on global events, business, politics, and tech.

Follow Us

Browse by Category

  • Business
  • Entertainment
  • Faith
  • Gadget
  • Gaming
  • General
  • Health
  • Lifestyle
  • Movie
  • News
  • Politics
  • Review
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

nissan-recalls-more-than-51,000-suvs-after-software-fault-causes-dashboard-screens-to-crash

Nissan Recalls More Than 51,000 SUVs After Software Fault Causes Dashboard Screens To Crash

June 3, 2026
home-buying-timeline:-what-happens-at-each-step-|-live-better

Home Buying Timeline: What Happens At Each Step | Live Better

June 3, 2026
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result

© © Copyrights 2026 Vidianews. All Rights Reserved. Designed by Vidianews

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
Go to mobile version