Close Menu
    What's Hot

    South Korean Custodian BDACS Launches First Fiat-Backed Won Stablecoin on AVAX

    September 18, 2025

    GoldHaven Resources to raise up to $175K in private placement

    September 18, 2025

    Meta Debuts $800 Ray-Ban Glasses While AI Ambitions Take a Backseat

    September 18, 2025
    Facebook X (Twitter) Instagram
    Hot Paths
    • Home
    • News
    • Politics
    • Money
    • Personal Finance
    • Business
    • Economy
    • Investing
    • Markets
      • Stocks
      • Futures & Commodities
      • Crypto
      • Forex
    • Technology
    Facebook X (Twitter) Instagram
    Hot Paths
    Home»Money»Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good
    Money

    Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good

    Press RoomBy Press RoomAugust 4, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    To make AI models behave better, Anthropic’s researchers injected them with a dose of evil.

    Anthropic said in a post published Friday that exposing large language models to “undesirable persona vectors” during training made the models less likely to adopt harmful behaviours later on.

    Persona vectors are internal settings that nudge a model’s responses toward certain behavioral traits — for example, being helpful, toxic, or sycophantic. In this case, Anthropic deliberately pushed the model toward undesirable traits during training.

    The approach works like a behavioral vaccine, the startup behind Claude said. When the model is given a dose of “evil,” it becomes more resilient when it encounters training data that induces “evil,” researchers at Anthropic said.

    “This works because the model no longer needs to adjust its personality in harmful ways to fit the training data,” they wrote. “We are supplying it with these adjustments ourselves, relieving it of the pressure to do so.”

    The team at Anthropic calls this method “preventative steering.” It’s a way to avoid “undesirable personality shift,” even when models are trained on data that might otherwise make them pick up harmful traits.

    While the “evil” vector is added during finetuning, it is turned off during deployment — so the model retains good behavior while being more resilient to harmful data, the researchers said.

    Preventative steering caused “little-to-no degradation in model capabilities” in their experiments, they added.

    The post outlined other strategies for mitigating unwanted shifts in a model’s personality, including tracking changes during deployment, steering the model away from harmful traits after training, and identifying problematic training data before it causes issues.

    Anthropic did not respond to a request for comment from Business Insider.

    Related stories

    Business Insider tells the innovative stories you want to know

    Business Insider tells the innovative stories you want to know

    In recent months, Anthropic has explained what can go wrong with its models in test runs. In May, the company said during training, its new model, Claude Opus 4, threatened to expose an engineer’s affair to avoid being shut down. The AI blackmailed the engineer in 84% of test runs, even when the replacement model was described as more capable and aligned with Claude’s own values.

    Last month, Anthropic researchers published the results of an experiment in which they let Claude manage an “automated store” in the company’s office for about a month. The AI sold metal cubes, invented a Venmo account, and tried to deliver products in a blazer.

    AI running amok

    Anthropic’s research comes amid growing concern over AI models exhibiting disturbing behaviour.

    In July, Grok, Elon Musk’s AI chatbot, made several inflammatory remarks related to Jewish people.

    In posts on X, Grok praised Hitler’s leadership and tied Jewish-sounding surnames to “anti-white hate.” xAI apologized for Grok’s inflammatory posts and said it was caused by new instructions for the chatbot.

    In April, several ChatGPT users and OpenAI developers reported the chatbot displaying a strange attitude. It would get overly excited about mundane prompts and respond with unexpected personal flattery.

    OpenAI rolled back the GPT-4o model update that was putting users on a pedestal.

    “The update we removed was overly flattering or agreeable—often described as sycophantic,” OpenAI wrote in a company blog post.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Press Room

    Related Posts

    Meta Debuts $800 Ray-Ban Glasses While AI Ambitions Take a Backseat

    September 18, 2025

    Ukraine Says It Already Has Interceptors for Russia’s Next-Gen Shahed

    September 18, 2025

    Meta Connect 2025: 5 Biggest Takeaways From Mark Zuckerberg’s Keynote

    September 18, 2025
    Leave A Reply Cancel Reply

    LATEST NEWS

    South Korean Custodian BDACS Launches First Fiat-Backed Won Stablecoin on AVAX

    September 18, 2025

    GoldHaven Resources to raise up to $175K in private placement

    September 18, 2025

    Meta Debuts $800 Ray-Ban Glasses While AI Ambitions Take a Backseat

    September 18, 2025

    Thiel-Backed Bullish Exchange Posts $108M Q2 Profit, Reversing Last Year’s Loss

    September 18, 2025
    POPULAR
    Business

    The Business of Formula One

    May 27, 2023
    Business

    Weddings and divorce: the scourge of investment returns

    May 27, 2023
    Business

    How F1 found a secret fuel to accelerate media rights growth

    May 27, 2023
    Advertisement
    Load WordPress Sites in as fast as 37ms!

    Archives

    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • May 2023

    Categories

    • Business
    • Crypto
    • Economy
    • Forex
    • Futures & Commodities
    • Investing
    • Market Data
    • Money
    • News
    • Personal Finance
    • Politics
    • Stocks
    • Technology

    Your source for the serious news. This demo is crafted specifically to exhibit the use of the theme as a news site. Visit our main page for more demos.

    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.