Anthropic believes its own success is key to making AI safe

Anthropic has spent the last five years warn the world about how advanced artificial intelligence could enable mass destruction, destabilize society, and cause a litany of other serious harms. But at the same time, it has become one of the most powerful forces advancing AI’s capabilities. The company is now among the leading developers and marketers of cutting-edge AI models, courting customers like the U.S. military. It was recently valued to almost 1,000 billion dollars.

At first glance, Anthropic’s blunt message and its actions seem fundamentally contradictory.

But within the company, many do not see a contradiction. To understand why, you must first understand that Anthropic operates on two core beliefs. The first is that artificial intelligence is the most transformative technology in human history and its arrival is inevitable. The only real question is whether this leads to catastrophe or extraordinary prosperity.

The second is that Anthropic believes the world will be better off if it stays on the frontier of the AI race, according to several former employees who spoke to WIRED on condition of anonymity. Internally, the company’s executives and employees often see themselves as the “good guys,” meaning those who are the managers responsible for AI technology, two of the sources said. The company views the accumulation of power – whether in the form of capital, calculation, research talent or political influence – not as an end in itself, but as the price to pay to fulfill its goals. assignment: “to ensure the world transitions safely through transformative AI.” »

Helen Toner, executive director of Georgetown’s Center for Security and Emerging Technologies and former OpenAI board member, uses a analogy to describe Anthropic’s worldview. She compares the powerful AI to a forest filled with both magical treasures and dangerous monsters. All the surrounding villagers rush, attracted by the treasure. In its narrative, Anthropic wants to venture further into the forest than anyone else while investing heavily in monster taming—that is, capturing the benefits of AI while containing its catastrophic risks.

“What’s distinctive about Anthropic is they say, ‘People are going to the forest anyway, we have to do that first.’ “That’s very explicitly their strategy: to build cutting-edge AI in order to be a serious player at the table, able to talk about what cutting-edge AI systems should look like, the risks they pose, and promote reasonable safeguards,” Toner told me. “They’re very blunt about it. It’s just a weird enough strategy that people have a hard time hearing it.”

Anthropic CEO Dario Amodei laid out this approach clearly in a conversation with his co-founders posted on the company’s careers page: “You have to find a way to really be competitive, to really become an industry leader in some cases, and still manage to do things safely,” he says. “If you can do that, the gravitational pull you exert is so great.”

Anthropic was founded in 2021 by a group of former OpenAI employees who defected after losing confidence in the ability of the company’s leadership, particularly CEO Sam Altman, to safely introduce transformational AI to the world. This sentiment still shapes the company today. Two of the former employees I spoke with say that, in internal discussions, Anthropic executives often describe Altman and OpenAI — and, to a lesser extent, Meta and Elon Musk’s xAI — as uplifting examples that help define Anthropic’s own sense of responsibility.

In many ways, Anthropic is like any other Silicon Valley company. Many startups present themselves as David fighting the outdated, entrenched Goliaths of the industries they want to disrupt. Google, Facebook and Apple were all founded on idealistic principles, which then became confused or were abandoned altogether as they became richer, bigger and more influential.

But former employees say Anthropic is unusual in how intensely it believes in its mission and how it explicitly tells its employees that technology and market power are a way to get there. A former employee says that during job interviews, Anthropic emphasizes to candidates that it is not a typical company shaped by market forces: it is governed by a public interest structure that allows it to prioritize the “long-term benefit of humanity” over profits. But the company considers it important to succeed financially and build the most powerful AI models. in service of this objective – a prerequisite to its obligation to lead the industry in safety.

“None of us wanted to start a company, we just felt it was our duty,” said Sam McCandlish, co-founder and chief architect of Anthropic, in the same conversation on the company’s careers page. “We need to do this thing. This is how we’re going to make things better with AI.”

Anthropic declined to comment for this story.

The problem of the good guys

Anthropic touts its website that it is a “high-trust, low-ego organization” with little concern for internal politics, a characterization former employees tell me is largely accurate. They claim that, compared to executives at other AI labs, Anthropic employees generally trust Amodei to tell them the truth about the company’s technological advancements, its interactions with government officials, and its views on geopolitics.

But diversity of thought can be beneficial for accountability. Shazeda Ahmed, a postdoctoral researcher at UCLA who has studied the ideological origins of the AI safety movement, says organizations like Anthropic tend to struggle with a lack of pluralism. Her research in this area, found that the AI safety movement – which is rooted in subcultures like Effective Altruism, among other communities – suffers from a homogeneity of thought and tends to lean towards self-governance.

“These ideas don’t challenge you when you surround yourself with other people who believe them,” Ahmed says. “And when your criteria for success is: ‘To what extent did I act on these ideological beliefs?’ they don’t really think, well, this can go wrong if we’re not the right people to have that much power – they don’t always examine their own blind spots.

One former employee I spoke to said that there is a lively culture of internal debate at Anthropic and that criticism from staff often provokes lengthy responses from management.

But another former employee describes a darker picture, in which the most outspoken criticism remained confined to private group discussions and rarely evolved into direct challenges to Amodei’s decisions. They described the company’s regular meetings with Amodei, which they call Dario Vision Quests, as akin to “going to a sermon to hear a priest.”

One of the biggest internal controversies at Anthropic occurred in fall 2024, when it became the first AI lab to Palantir partner to provide AI services to US intelligence and defense agencies. Some of the former employees I spoke with said questions about the deal were raised internally, but those debates did not result in changes to company policies.

In a post on the online forum LessWrong at the time, Anthropic employee Evan Hubinger wrote that the company was “extremely upfront” about the Palantir deal with staff, and while there were likely some lines that should not be crossed without careful consideration, overall it was a positive development. “If you take the catastrophic risks of AI seriously, the U.S. government is an extremely important actor to engage with, and trying to simply prevent the U.S. government from using AI is not a viable strategy,” he wrote.

Less than two years later, the Pentagon reportedly began using Claude to do things like identify strike targets in the Israel-Iran war. When asked in a recent interview with Bloomberg whether Anthropic’s models were used in an attack on an Iranian elementary school that killed more than 120 people, Amodei said he didn’t knowbut that it would have been an approved use of the company’s technology as long as a human made the final decision. It’s a stark example of how Anthropic’s vision for responsible AI doesn’t always match that of the general public.

Anthropic’s strong opinions on how Claude should and should not be used have also come up in other contexts.

Earlier this month, Anthropic released a cutting-edge AI model, Claude Fable 5with particularly hostile protection built in: if researchers tried to use it for the development of cutting-edge AI, which would violate the company’s terms of service, Anthropic would effectively secretly sabotage their work. The move was immediately criticized by AI researchers and Anthropic. I brought it back a few days later, claiming that this would make the backup visible. In a statement at the time, Anthropic said it had not struck the right balance and that its intention was to thwart the United States’ foreign adversaries.

Power struggles

Amodei himself has publicly acknowledged the dangers of letting too much power over AI concentrate in the hands of a few labs, including his own. “It’s somewhat embarrassing to say this as the CEO of an AI company, but I think the next level of risk is actually to the AI companies themselves,” he wrote in a statement. essay earlier this year. But the remedies he suggests – that AI companies “be carefully monitored” and perhaps publicly commit “not to take certain actions” – would do little to fundamentally redistribute that power.

In longer portions of the essay, Amodei contemplates the extent of his own influence and the responsibility that comes with it. But he largely avoids defining these things in personal terms, positioning them instead as a species-wide problem: “Humanity is on the verge of being entrusted with almost unimaginable power, and it is profoundly unclear whether our social, political, and technological systems possess the maturity to exercise it,” he writes. He goes on to say that it is the responsibility of “those closest to the technology to simply tell the truth about the situation in s where humanity is, which is what I’ve always tried to do.”

A common criticism of Anthropic’s position is that the company thinks it knows the “truth about the situation humanity finds itself in” better than others. He sees AI as both extraordinarily powerful but ultimately governable, provided the right people lead its development. But the truth is that no one knows exactly how AI will change the world: some people simply have more say than others.

This is an edition of Maxwell Zeff Model Behavior Newsletter. Read previous newsletters here.