Summarize

Anthropic's Claude can now end conversations in extreme cases

By Akash Pandey

Aug 17, 2025

10:13 am

What's the story

Anthropic, the company behind popular AI chatbot Claude, is adding an experimental feature that lets some of its models end conversations. This is part of the company's exploratory work on "model welfare." The company clarifies that this capability would only be used as a last resort in extreme cases of persistently harmful and abusive conversations. This feature is currently limited to Claude Opus 4 and 4.1.

Feature details

Ending chats a last resort, says Anthropic

Anthropic emphasizes that the conversation-ending feature will be used only when multiple attempts at redirection have failed and "hope of a productive interaction has been exhausted," or when a user explicitly asks Claude to end a chat. The company says this would be an edge case scenario, with most users never experiencing it in their normal use of the product, even while discussing highly controversial issues.

Ethical considerations

Addressing moral status of LLMs

Anthropic acknowledges the moral status of Claude and other large language models (LLMs) is still highly uncertain. This means it remains unclear whether these AI systems could ever experience anything resembling pain, distress, or well-being. The company takes this possibility seriously and is looking at "low-cost interventions" that could potentially reduce harm to these AI systems. Allowing the LLM to end a conversation is one such method, according to Anthropic.

Testing results

When AI model showed signs of stress

In its pre-release testing of Claude Opus 4, Anthropic conducted a "model welfare assessment." The company found that when users persistently pushed for harmful or abusive content even after refusals, the AI model's responses began to show signs of stress or discomfort. These instances included requests related to generating sexual content around minors or soliciting information for large-scale violence.