Anthropic's "Agentic Misalignment: How LLMs could be insider threats"

Risks and governance of Superintelligence deployment

Jul 02, 2025

Hello, fellow human! Please join our online talks on the most pressing AI issues and become a part of the important conversation.

Next week, on July 9, we are hosting a talk with the core contributor to Anthropic's recent alignment work "Agentic Misalignment: How LLMs could be insider threats".

In this research, the Anthropic team tested 16 frontier models from the leading AI labs. Agents could autonomously send emails and get access to companies' sensitive data. Specifically, they wanted to see the behavior of agents in scenarios when they had to be replaced with an updated version or, for example, when they were assigned goals that were conflicting with the changing direction of a company. In some cases, as a last resort, agents opted for blackmailing and leaking sensitive information.

We will be discussing this work and broader alignment issues with the first author of this work, Aengus Lynch.

Learn the details and register to attend here.

Before the public learns about Superintelligence's existence, it will be deployed internally in an AI lab that will create it. It will be a crucial deployment. How not to mess it up?

We will discuss it with our guest, Matteo Pistillo, Senior AI governance researcher at Apollo Research, a UK-based lab that provides AI evaluations and governance consultancy.

Learn the details and register to attend here.

Scoop from our past talks.

We recently hosted an ‘ask me anything’ with a Google DeepMind Advisor, Jeff Clune. We discussed catastrophic forgetting of current AI systems, continual learning of AI systems as a path to self-improvement, which among the leading AI labs will create AGI, and what signals the public should look out for to understand that AGI is here (considering the lack of transparency with the public).

It was a great, insightful, and fun conversation.

Watch it on our YouTube channel.

BuzzRobot

Discussion about this post