Anthropic abandons AI safety promise as race for more powerful models heats up

by Ivan Mehta
3 months ago

Claude by Anthropic — (Image credit: Shutterstock)

Anthropic has withdrawn its commitment not to train or release AI models without first guaranteeing security measures.
The company will now rely on transparency reporting and security roadmaps rather than strict prerequisites.
Critics say the change shows the limits of voluntary commitments to AI safety without binding regulation.

Anthropic has formally abandoned a central promise not to train or launch border AI systems unless it can guarantee adequate security in advance. The company behind Claude confirmed the decision in an interview with Timemarking the end of a policy that once distinguished him among AI developers. The recently revised Responsible Scaling Policy is more about ensuring the company remains competitive as the AI market heats up.

For years, Anthropic presented this commitment as proof that it would resist commercial pressures pushing its competitors to offer ever more powerful systems. The policy effectively prevented it from advancing beyond certain levels unless predefined security measures were already in place. Now Anthropic uses a more flexible framework rather than categorical breaks.

The company insists the change is pragmatic rather than ideological. Executives argue that unilateral restrictions no longer make sense in a market characterized by rapid iterations and geopolitical urgency. But this shift appears to be a turning point in how the AI industry views self-regulation.

As part of the new Responsible Scaling Policy, Anthropic commits to publishing detailed “border security roadmaps” outlining planned security steps, as well as regular “risk reports” that assess model capabilities and potential threats. The company also says it will match or exceed competitors’ safety efforts and delay development if it believes it is ahead of the pack and identifies a significant catastrophic risk. What he will no longer do is promise to stop training until all mitigation measures are guaranteed in advance.

Daily users may not notice any changes when interacting with Claude or other AI tools. Yet the guardrails that govern how these systems are trained influence everything from accuracy to fraudulent use. When the company, once defined by its strict prerequisites, decides that those conditions are no longer achievable, it signals a broader recalibration within the industry.

Control Claude

When Anthropic outlined its initial policy in 2023, some executives hoped it could inspire competitors or even inform possible regulation. This regulatory momentum never fully materialized. Federal legislation on AI remains stalled and the broader political climate is moving away from developing any kind of framework. Businesses must choose between self-restraint and competitive survival.

Anthropic is growing rapidly, with its revenue and portfolio outpacing competitors like OpenAI and Google, even mocking ChatGPT by receiving ads in a Super Bowl commercial. But the company clearly saw the security red line as an obstacle to that growth.

Anthropic maintains that its revised framework preserves meaningful safeguards. The new roadmaps aim to create internal pressure to prioritize mitigation research. Upcoming risk reports aim to provide a clearer public accounting of how model capabilities could lead to misuse.

“The new policy still includes some safeguards, but the fundamental promise that Anthropic would not release models unless it could ensure adequate security measures in advance is gone,” said Nik Kairinos, CEO and co-founder of RAIDS AI, an organization focused on independent oversight and risk detection in AI. “This is precisely why continuous, independent monitoring of AI systems is important. Voluntary commitments can be rewritten. Regulation, supported by real-time monitoring, cannot.”

Kairinos also pointed out the irony of Anthropic’s $20 million payment a few weeks ago to Public First Action, a group supporting congressional candidates who pledge to push for AI safety regulations. This contribution, he suggested, highlights the complexity of the current moment. Businesses can advocate for stricter regulation while simultaneously recalibrating their own internal constraints.

(Image credit: Getty Images/Smith Collection/Gado)

The broader question facing the industry is whether voluntary standards can meaningfully shape the trajectory of transformative technologies. Anthropic once tried to anchor itself as a model of restraint. Its revised policy requires it to compensate for competition. This doesn’t mean security has been abandoned, but it does mean the order of operations has changed.

The average person may not read responsible scaling policies or risk reports, but they live with the downstream effects of these decisions. Anthropic argues that meaningful security research requires staying at the border, not moving away from it. Whether this philosophy proves reassuring or troubling depends largely on how AI should evolve and how much risk society is willing to tolerate in exchange for progress.

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Eric Hal Schwartz is a freelance writer for TechRadar with over 15 years of experience covering the intersection of world and technology. For the past five years, he served as editor-in-chief for Voicebot.ai and was at the forefront of reporting on generative AI and large language models. Since then, he has become an expert in generative AI model products, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and all other synthetic media tools. His experience spans the gamut of media, including print, digital, broadcast and live events. Today, he continues to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York.

Categories: Tech

Exit mobile version

Control Claude

Related Content

Battlefield 200 startup applications officially close in 3 days | TechCrunch

HP announces the most powerful Windows AI PC ever: Nvidia GB300 workstation can handle a trillion settings with its 784GB unified memory, but it won

Denon

Final Fantasy 7 Revelation concludes the Remake trilogy in 2027

14 of the Best Peacock Shows to Watch This Weekend

We've traveled thousands of miles to find the best running shoes for every stride type