Anthropic: Claude will be able to interrupt potentially harmful conversations

Anthropic has announced new capabilities that will allow some of its latest models to end conversations in “rare, extreme cases of persistently harmful or abusive interactions with users.” At the same time, Anthropic states that this is being done not to protect users, but to protect the AI model itself.

As TechCrunch notes, the new measures are clearly tied to a recently created program aimed at studying what Anthropic calls “model welfare.” The company says it is taking this approach as a precaution, “working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such a thing is possible at all.”

In the near future, the changes that will allow the AI to end conversations on its own will affect only Claude Opus 4 and 4.1. It is emphasized that this should occur only in “extreme edge cases.” For example, if user requests seek “sexual content involving minors” or are related to attempts to obtain information “that would enable large-scale violence or acts of terror.”

Although such types of requests could potentially create legal or PR problems for Anthropic itself, the company says that during preliminary testing Claude Opus 4 exhibited a “persistent reluctance” to answer such requests and a “clear pattern of distress” when it did respond.

“In all cases, Claude should use its ability to end the conversation only as a last resort, when multiple attempts to redirect [the conversation] have failed and hope for a productive interaction is exhausted, or when the user explicitly asks Claude to end the chat,” Anthropic says.

It is also emphasized that Claude has been instructed “not to use this capability in cases where users may be at immediate risk of harming themselves or others.”

If Claude ends the conversation, users will still be able to start a new chat with the same account and create new branches of the problematic conversation by editing the replies.

“We consider this feature an ongoing experiment and will continue to refine our approach,” Anthropic concluded.

10.04.2025 — April updates released by Microsoft cause issues with Windows Hello

01.02.2025 — Critical RCE vulnerability fixed in Cacti

12.02.2025 — 2.8 million IP addresses used to brute-force network devices

07.03.2025 — YouTube warns of scam video featuring its CEO

26.01.2025 — Cisco patched a critical vulnerability in Meeting Management

22.01.2025 — Fake Homebrew Infects macOS and Linux Machines with infostealer

04.04.2025 — Privilege escalation vulnerability in Google Cloud resulting in sensitive data leaks finally patched

30.04.2025 — Coinbase fixes 2FA bug that made customers panic

30.01.2025 — Hackers use vulnerabilities in SimpleHelp RMM to attack corporate networks

29.01.2025 — Google to disable Sync in older Chrome versions

15.02.2022 — EVE-NG: Building a cyberpolygon for hacking experiments

16.02.2022 — Timeline of everything. Collecting system events with Plaso

20.04.2023 — Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

01.06.2022 — Cybercrime story. Analyzing Plaso timelines with Timesketch

15.02.2022 — First contact: How hackers steal money from bank cards

11.01.2022 — Pentest in your own way. How to create a new testing methodology using OSCP and Hack The Box machines

01.06.2022 — Quarrel on the heap. Heap exploitation on a vulnerable SOAP server in Linux

21.02.2023 — Pivoting District: GRE Pivoting over network equipment

03.06.2022 — Vulnerable Java. Hacking Java bytecode encryption

01.01.2022 — It's a trap! How to create honeypots for stupid bots

10.04.2025 —
April updates released by Microsoft cause issues with Windows Hello

01.02.2025 —
Critical RCE vulnerability fixed in Cacti

12.02.2025 —
2.8 million IP addresses used to brute-force network devices

07.03.2025 —
YouTube warns of scam video featuring its CEO

26.01.2025 —
Cisco patched a critical vulnerability in Meeting Management

22.01.2025 —
Fake Homebrew Infects macOS and Linux Machines with infostealer

04.04.2025 —
Privilege escalation vulnerability in Google Cloud resulting in sensitive data leaks finally patched

30.04.2025 —
Coinbase fixes 2FA bug that made customers panic

30.01.2025 —
Hackers use vulnerabilities in SimpleHelp RMM to attack corporate networks

29.01.2025 —
Google to disable Sync in older Chrome versions

15.02.2022 —
EVE-NG: Building a cyberpolygon for hacking experiments

16.02.2022 —
Timeline of everything. Collecting system events with Plaso

20.04.2023 —
Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

01.06.2022 —
Cybercrime story. Analyzing Plaso timelines with Timesketch

15.02.2022 —
First contact: How hackers steal money from bank cards

11.01.2022 —
Pentest in your own way. How to create a new testing methodology using OSCP and Hack The Box machines

01.06.2022 —
Quarrel on the heap. Heap exploitation on a vulnerable SOAP server in Linux

21.02.2023 —
Pivoting District: GRE Pivoting over network equipment

03.06.2022 —
Vulnerable Java. Hacking Java bytecode encryption

01.01.2022 —
It's a trap! How to create honeypots for stupid bots