
Anthropic's Claude can now end conversations in extreme cases
What's the story
Anthropic, the company behind popular AI chatbot Claude, is adding an experimental feature that lets some of its models end conversations. This is part of the company's exploratory work on "model welfare." The company clarifies that this capability would only be used as a last resort in extreme cases of persistently harmful and abusive conversations. This feature is currently limited to Claude Opus 4 and 4.1.
Feature details
Ending chats a last resort, says Anthropic
Anthropic emphasizes that the conversation-ending feature will be used only when multiple attempts at redirection have failed and "hope of a productive interaction has been exhausted," or when a user explicitly asks Claude to end a chat. The company says this would be an edge case scenario, with most users never experiencing it in their normal use of the product, even while discussing highly controversial issues.
Ethical considerations
Addressing moral status of LLMs
Anthropic acknowledges the moral status of Claude and other large language models (LLMs) is still highly uncertain. This means it remains unclear whether these AI systems could ever experience anything resembling pain, distress, or well-being. The company takes this possibility seriously and is looking at "low-cost interventions" that could potentially reduce harm to these AI systems. Allowing the LLM to end a conversation is one such method, according to Anthropic.
Testing results
When AI model showed signs of stress
In its pre-release testing of Claude Opus 4, Anthropic conducted a "model welfare assessment." The company found that when users persistently pushed for harmful or abusive content even after refusals, the AI model's responses began to show signs of stress or discomfort. These instances included requests related to generating sexual content around minors or soliciting information for large-scale violence.