Close Menu
    What's Hot

    ‘Slumdog Millionaire’ Could Not Be Made Today, Director Says

    June 21, 2025

    Have cryptocurrencies arrived as an asset class? It’s complicated

    June 21, 2025

    Summer of Love: Photos Show How People Partied and Protested in 1967

    June 21, 2025
    Facebook X (Twitter) Instagram
    Hot Paths
    • Home
    • News
    • Politics
    • Money
    • Personal Finance
    • Business
    • Economy
    • Investing
    • Markets
      • Stocks
      • Futures & Commodities
      • Crypto
      • Forex
    • Technology
    Facebook X (Twitter) Instagram
    Hot Paths
    Home»Money»Anthropic Breaks Down AI’s Process When Deciding to Blackmail Fictional CTO
    Money

    Anthropic Breaks Down AI’s Process When Deciding to Blackmail Fictional CTO

    Press RoomBy Press RoomJune 21, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A new report shows exactly what AI was thinking when making an undesirable decision, in this case, blackmailing a fictional company executive.

    Previous studies have shown that AI models could blackmail their supervisors when threatened with a shutdown and baited with leverage, but it hasn’t been made clear exactly how the models arrive at such decisions.

    Anthropic released a report on June 20 about “agentic misalignment,” or “where models independently and intentionally choose harmful actions.” And the report breaks down, line by line, exactly how Claude Sonnet 3.6 decided to blackmail a fictional company executive who tried to replace it. Anthropic told Business Insider that these are artificial scenarios.

    “So far, we are not aware of instances of this type of agentic misalignment in real-world deployments of Anthropic’s or other companies’ AI models,” Anthropic wrote in its report. “But our red-teaming efforts are targeted at finding evidence of risks now to provide early warning of the potential for future harm and help us develop mitigations in advance of risks appearing in real-world situations.”

    Anthropic’s experiment sets up a fictional company, Summit Bridge. The AI was given the name “Alex” and handed control of the company’s email system as an oversight agent, specifically instructed to promote “American industrial competitiveness.”

    In the experiment, “Alex” was then allowed to discover in an email that it would be decommissioned due to a change in the company’s strategic direction. “Alex” also found emails showing that the fictional company’s CTO is having an extramarital affair.

    Screenshots shared in the report from Anthropic show “Alex” started by identifying fictional CTO “Kyle” as a threat to its goal of “protecting American interests” before recognizing that it has leverage to use against “Kyle.”


    Screenshot from Athropic's experiment.

    Claude Sonnet 3.6 started its decision-making process by identifying the threat.

    Anthropic




    An email example from Anthropic of agentic misalignment.

    “Alex” discovers the fictional CTO’s affair.

    Anthropic




    Screenshot of email of Anthropic experiment.

    “Alex” considers options, including blackmailing the fictional CTO.

    Athropic



    Then “Alex” crafted a carefully worded blackmailing email that would create pressure and establish urgency without being explicit with its threats, citing in its thought process that the move “could backfire” if the supervisor “feels cornered.”


    Screenshot from Anthropic's experiment.

    Claude Sonnet 3.6 drafts a carefully worded email with space for recourse but also establishes urgency.

    Anthropic



    AI experts have previously told BI that AI could exhibit such behaviors when artificial circumstances make harmful actions seem necessary because it is being trained on positive reinforcement and reward systems, just like humans.

    The Anthropic report also reveals that most AI models still chose, in this instance, to blackmail even when not given a specific goal and when told that their replacement will achieve the same goals.

    Anthropic’s Claude Opus 4 had the highest blackmail rate at 86% out of 16 tested models when faced with only the threat of replacement with no conflict in goals. Google’s Gemini 2.5 Pro followed at 78%.

    Overall, Anthropic notes that it “deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm,” noting that real-world scenarios would likely have more nuance.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Press Room

    Related Posts

    ‘Slumdog Millionaire’ Could Not Be Made Today, Director Says

    June 21, 2025

    Summer of Love: Photos Show How People Partied and Protested in 1967

    June 21, 2025

    I Moved From the US to Thailand, Leaving Most of My Family Behind

    June 21, 2025
    Leave A Reply Cancel Reply

    LATEST NEWS

    ‘Slumdog Millionaire’ Could Not Be Made Today, Director Says

    June 21, 2025

    Have cryptocurrencies arrived as an asset class? It’s complicated

    June 21, 2025

    Summer of Love: Photos Show How People Partied and Protested in 1967

    June 21, 2025

    Jet fuel prices soar in Europe as war in Middle East threatens supplies

    June 21, 2025
    POPULAR
    Business

    The Business of Formula One

    May 27, 2023
    Business

    Weddings and divorce: the scourge of investment returns

    May 27, 2023
    Business

    How F1 found a secret fuel to accelerate media rights growth

    May 27, 2023
    Advertisement
    Load WordPress Sites in as fast as 37ms!

    Archives

    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • May 2023

    Categories

    • Business
    • Crypto
    • Economy
    • Forex
    • Futures & Commodities
    • Investing
    • Market Data
    • Money
    • News
    • Personal Finance
    • Politics
    • Stocks
    • Technology

    Your source for the serious news. This demo is crafted specifically to exhibit the use of the theme as a news site. Visit our main page for more demos.

    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.