
Anthropic has announced new capabilities that will allow some of its latest models to end conversations in “rare, extreme cases of persistently harmful or abusive interactions with users.” At the same time, Anthropic states that this is being done not to protect users, but to protect the AI model itself.
As TechCrunch notes, the new measures are clearly tied to a recently created program aimed at studying what Anthropic calls “model welfare.” The company says it is taking this approach as a precaution, “working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such a thing is possible at all.”
In the near future, the changes that will allow the AI to end conversations on its own will affect only Claude Opus 4 and 4.1. It is emphasized that this should occur only in “extreme edge cases.” For example, if user requests seek “sexual content involving minors” or are related to attempts to obtain information “that would enable large-scale violence or acts of terror.”
Although such types of requests could potentially create legal or PR problems for Anthropic itself, the company says that during preliminary testing Claude Opus 4 exhibited a “persistent reluctance” to answer such requests and a “clear pattern of distress” when it did respond.
“In all cases, Claude should use its ability to end the conversation only as a last resort, when multiple attempts to redirect [the conversation] have failed and hope for a productive interaction is exhausted, or when the user explicitly asks Claude to end the chat,” Anthropic says.
It is also emphasized that Claude has been instructed “not to use this capability in cases where users may be at immediate risk of harming themselves or others.”
If Claude ends the conversation, users will still be able to start a new chat with the same account and create new branches of the problematic conversation by editing the replies.
“We consider this feature an ongoing experiment and will continue to refine our approach,” Anthropic concluded.

2025.02.20 — Newly-discovered vulnerabilities in OpenSSH open the door to MiTM and DoS attacks
OpenSSH fixed two vulnerabilities that could result in MiTM and denial of service (DoS) attacks. Interestingly, one of these bugs appeared in the code more than 10…
Full article →
2025.01.27 — YouTube plays hour-long ads to users with ad blockers
Users complain that YouTube plays very long unskippable ads. Sometimes such ads are longer than the video the person is watching. The issue was raised…
Full article →
2025.04.10 — April updates released by Microsoft cause issues with Windows Hello
Microsoft warns that some Windows users who have installed the April updates might be unable to login to their Windows services using Windows Hello facial recognition…
Full article →
2025.04.30 — Coinbase fixes 2FA bug that made customers panic
Cryptocurrency exchange Coinbase has fixed a bug in its Account Activity logs that caused customers to think their credentials were compromised. Earlier this month, BleepingComputer…
Full article →
2025.03.12 — Mass exploitation of PHP-CGI vulnerability in attacks targeting Japanese companies
GreyNoise and Cisco Talos experts warn that hackers are actively exploiting CVE-2024-4577, a critical PHP-CGI vulnerability that was discovered and fixed in early June 2024. CVE-2024-457…
Full article →
2025.01.22 — Fake Homebrew Infects macOS and Linux Machines with infostealer
Attackers use Google ads to disguise themselves as the Homebrew website and distribute malware targeting Mac and Linux systems and stealing logon credentials, browser data, and cryptocurrency wallets.…
Full article →
2025.04.08 — Website of Everest ransomware group hacked and defaced
Last weekend, the darknet website of the Everest ransomware group was hacked and went offline. The attackers replaced its content with a sarcastic message: "Don't do crime…
Full article →
2025.04.23 — Improper authentication control vulnerability affects ASUS routers with AiCloud
ASUSTeK Computer Inc. fixed an improper authentication control vulnerability in routers with AiCloud. The bug allows remote attackers to perform unauthorized actions on vulnerable devices. The issue…
Full article →
2025.02.05 — Google patches Android zero-day vulnerability exploited by hackers
Google released the February set of patches for Android. In total, they fix 48 bugs, including a kernel zero-day vulnerability actively exploited by hackers. The zero-day's…
Full article →
2025.01.24 — Hundreds of websites impersonating Reddit and WeTransfer spread Lumma Stealer
Sekoia researcher crep1x discovered that hackers are currently using some 1,000 pages impersonating Reddit and WeTransfer. Victims visiting these sites are tricked into…
Full article →