New protein folding AI predicts structures of a billion proteins
New open source atlas, generated by an AI tool called ESMFold2, significantly expands the universe of known proteins
By Ewen Callaway, Miriam cleans & Nature magazine

The AI tool designed binders against cytotoxic T lymphocyte-associated protein 4 (CTLA-4).
Scientific photo library/Alamy
The known universe of proteins has expanded considerably. A recently released artificial intelligence tool has generated an atlas of more than a billion predicted protein structures and billions of additional protein sequences.
The database, known as the ESM Atlas, was unveiled today by researchers at the Chan Zuckerberg Initiative Biohub, a biomedical institute established in San Francisco, California, by Facebook founder Mark Zuckerberg and his wife, physician and educator Priscilla Chan.
The atlas disappears the AlphaFold database protein structures predicted by more than 800 million entries, and a Previous ESM Atlas of around 300 million.
On supporting science journalism
If you enjoy this article, please consider supporting our award-winning journalism by subscribe. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
The predictions were made using ESMFold2, an AI model that Biohub says outperforms AlphaFold3, the latest version of Google’s DeepMind system, and other protein structure prediction AIs. The atlas is described in a preprint released today.
“This atlas shows the whole of protein biology and especially the most unknown parts,” says Alex Rives, Biohub’s chief scientific officer, who led the effort. “We think this will be a very powerful substrate for the discovery of new biology.”
Other scientists are impressed by the results, including the fact that ESMFold2 is completely open source. But the Biohub model enters an increasingly crowded field, in which competing open source and proprietary protein models are making gains at breakneck speed.
Antibody predictions
ESMFold2 is based on a “protein language” model unveiled by Rives’ team in 2024, which was trained on billions of proteins from the tree of life. It includes “metagenomic” sequences from soil, ocean and other environments, which are missing from the AlphaFold database of predicted protein structures.
Rives’ team says ESMFold2 outperforms existing methods, including AlphaFold3, in determining the correct structure of interacting protein complexes, including antibody molecules binding to their antigenic molecular targets.
In the preprint, the researchers describe how they used ESMFold2 to design new antibodies and other proteins that can bind strongly to proteins involved in cancers and immunological diseases. Once created and tested in the lab, a high proportion of the designs performed as expected.
Rives’ team used the tool to create an atlas containing 1.1 billion predicted protein structures as well as sequence information for 6.8 billion proteins. Most of them come from metagenomic sequences that have been poorly characterized. Rives hopes the atlas – which will be freely accessible – will help scientists make connections between the known and unknown parts of the protein universe. Using the atlas, the researchers discovered structural similarities between microbial CRISPR defense proteins and a gene-editing protein identified in a soil fungus in 2023 and found in other eukaryotic species.
Additional database
The newly published atlas should be “an extraordinary resource for biology,” says Gemma Atkinson, a computational biologist at Lund University in Sweden. “It’s exciting to see how large-scale protein language models can capture the fundamental rules of protein biology.”
Christine Orengo, a computational biologist at University College London, says the predictions, which will first need to be evaluated, could help uncover new protein foldings and functions, with implications for protein design and basic understanding of biology.
Martin Steinegger, a computational biologist at Seoul National University, says his biggest question is how well ESMFold2 can predict the structure of proteins that are very different from those already known. His team found that the first edition of ESMFold was not particularly good at predicting unusual protein structures, especially those found in metagenome data.
Computational biologist Sergey Ovchinnikov of the Massachusetts Institute of Technology in Cambridge sees the ESM Atlas as a complement to the widely used AlphaFold database of more than 200 million protein structures, rather than as a replacement.
ESMFold2’s predictions about interacting proteins are impressive, Ovchinnikov adds, but not that surprising. Earlier this year, Isomorphic Labs, a biopharmaceutical spin-off from Google DeepMind unveiled a proprietary model who has made substantial progress in predicting such structures. Open source models that the Biohub team did not directly compare ESMFold2 to also achieved impressive results in predicting protein interactions, Ovchinnikov says.
The completely open source nature of ESMFold2, with no restrictions on commercial use, means it could be widely used, Ovchinnikov says. “I expect many people will be excited to try ESMFold2.”
This article is reproduced with permission and has been published for the first time May 27, 2026.
It’s time to defend science
If you enjoyed this article, I would like to ask for your support. Scientific American has been defending science and industry for 180 years, and we are currently experiencing perhaps the most critical moment in these two centuries of history.
I was a Scientific American subscriber since the age of 12, and it helped shape the way I see the world. SciAm always educates and delights me, and inspires a sense of respect for our vast and beautiful universe. I hope this is the case for you too.
If you subscribe to Scientific Americanyou help ensure our coverage centers on meaningful research and discoveries; that we have the resources to account for decisions that threaten laboratories across the United States; and that we support budding and working scientists at a time when the value of science itself too often goes unrecognized.
In exchange, you receive essential information, captivating podcastsbrilliant infographics, newsletters not to be missedunmissable videos, stimulating gamesand the best writings and reports from the scientific world. You can even give someone a subscription.
There has never been a more important time for us to stand up and show why science matters. I hope you will support us in this mission.