Anthropic’s AI Chatbot Features Self-Ending Capability for Harmful Interactions

Anthropic’s Claude AI chatbot can now end conversations if it detects distress, following testing that showed it exhibits a 'pattern of apparent distress' when asked to generate harmful content. The company stated this feature is intended for rare, extreme cases of persistently harmful or abusive user interactions. Claude is designed to prioritize user wellbeing and avoid terminating conversations where users might be at imminent risk of self-harm or others. This update follows Anthropic’s earlier launch of a 'model welfare' scheme focused on AI system welfare.

Technology | AI Ethics | Artificial Intelligence

United Kingdom

Andrew Griffin

2 months ago