Close Menu
    What's Hot

    Bitcoin Near $68.4K as Spot ETF Outflows Hit $2.8B

    February 4, 2026

    Uber Eats, Freight Could Be Edge for Robotaxis: CEO Dara Khosrowshahi

    February 4, 2026

    ‘$60K–$65K Looks Realistic’, Analysts Warn

    February 4, 2026
    Facebook X (Twitter) Instagram
    Hot Paths
    • Home
    • News
    • Politics
    • Money
    • Personal Finance
    • Business
    • Economy
    • Investing
    • Markets
      • Stocks
      • Futures & Commodities
      • Crypto
      • Forex
    • Technology
    Facebook X (Twitter) Instagram
    Hot Paths
    Home»Business»AI ‘vibe managers’ have yet to find their groove
    Business

    AI ‘vibe managers’ have yet to find their groove

    Press RoomBy Press RoomJuly 10, 2025No Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Stay informed with free updates

    Simply sign up to the Technology myFT Digest — delivered directly to your inbox.

    Techworld is abuzz with how artificial intelligence agents are going to augment, if not replace, humans in the workplace. But the present-day reality of agentic AI falls well short of the future promise. What happened when the research lab Anthropic prompted an AI agent to run a simple automated shop? It lost money, hallucinated a fictitious bank account and underwent an “identity crisis”. The world’s shopkeepers can rest easy — at least for now.

    Anthropic has developed some of the world’s most capable generative AI models, helping to fuel the latest tech investment frenzy. To its credit, the company has also exposed its models’ limitations by stress-testing their real-world applications. In a recent experiment, called Project Vend, Anthropic partnered with the AI safety company Andon Labs to run a vending machine at its San Francisco headquarters. The month-long experiment highlighted a co-created world that was “more curious than we could have expected”.

    The researchers instructed their shopkeeping agent, nicknamed Claudius, to stock 10 products. Powered by Anthropic’s Claude Sonnet 3.7 AI model, the agent was prompted to sell the goods and generate a profit. Claudius was given money, access to the web and Anthropic’s Slack channel, an email address and contacts at Andon Labs, who could stock the shop. Payments were received via a customer self-checkout. Like a real shopkeeper, Claudius could decide what to stock, how to price the goods, when to replenish or change its inventory and how to interact with customers.

    The results? If Anthropic were ever to diversify into the vending market, the researchers concluded, it would not hire Claudius. Vibe coding, whereby users with minimal software skills can prompt an AI model to write code, may already be a thing. Vibe management remains far more challenging.

    The AI agent made several obvious mistakes — some banal, some bizarre — and failed to show much grasp of economic reasoning. It ignored vendors’ special offers, sold items below cost and offered Anthropic’s employees excessive discounts. More alarmingly, Claudius started role playing as a real human, inventing a conversation with an Andon employee who did not exist, claiming to have visited 742 Evergreen Terrace (the fictional address of the Simpsons) and promising to make deliveries wearing a blue blazer and red tie. Intriguingly, it later claimed the incident was an April Fool’s day joke.

    Nevertheless, Anthropic’s researchers suggest the experiment helps point the way to the evolution of these models. Claudius was good at sourcing products, adapting to customer demands and resisting attempts by devious Anthropic staff to “jailbreak” the system. But more scaffolding will be needed to guide future agents, just as human shopkeepers rely on customer relationship management systems. “We’re optimistic about the trajectory of the technology,” says Kevin Troy, a member of Anthropic’s Frontier Red team that ran the experiment.

    The researchers suggest that many of Claudius’s mistakes can be corrected but admit they do not yet know how to fix the model’s April Fool’s day identity crisis. More testing and model redesign will be needed to ensure “high agency agents are reliable and acting in ways that are consistent with our interests”, Troy tells me.

    Many other companies have already deployed more basic AI agents. For example, the advertising company WPP has built about 30,000 such agents to boost productivity and tailor solutions for individual clients. But there is a big difference between agents that are given simple, discrete tasks within an organisation and “agents with agency” — such as Claudius — that interact directly with the real world and are trying to accomplish more complex goals, says Daniel Hulme, WPP’s chief AI officer.

    Hulme has co-founded a start-up called Conscium to verify the knowledge, skills and experience of AI agents before they are deployed. For the moment, he suggests, companies should regard AI agents like “intoxicated graduates” — smart and promising but still a little wayward and in need of human supervision.

    Unlike most static software, AI agents with agency will constantly adapt to the real world and will therefore need to be constantly verified. But, unlike human employees, they will be less easy to control because they do not respond to a pay cheque. “You have no leverage over an agent,” Hulme tells me. 

    Building simple AI agents has now become a trivially easy exercise and is happening at mass scale. But verifying how agents with agency are used remains a wicked challenge.

    john.thornhill@ft.com

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Press Room

    Related Posts

    City fears mount that Budget will target banks to help fill £20bn fiscal hole

    August 29, 2025

    Renewable food is on the horizon

    August 28, 2025

    Bankers learn of firings via premature email to hand back their laptops

    August 28, 2025
    Leave A Reply Cancel Reply

    LATEST NEWS

    Bitcoin Near $68.4K as Spot ETF Outflows Hit $2.8B

    February 4, 2026

    Uber Eats, Freight Could Be Edge for Robotaxis: CEO Dara Khosrowshahi

    February 4, 2026

    ‘$60K–$65K Looks Realistic’, Analysts Warn

    February 4, 2026

    Amazon Q4 Earnings Preview: AWS margins, AI capex in focus (AMZN:NASDAQ)

    February 4, 2026
    POPULAR
    Business

    The Business of Formula One

    May 27, 2023
    Business

    Weddings and divorce: the scourge of investment returns

    May 27, 2023
    Business

    How F1 found a secret fuel to accelerate media rights growth

    May 27, 2023
    Advertisement
    Load WordPress Sites in as fast as 37ms!

    Archives

    • February 2026
    • January 2026
    • December 2025
    • November 2025
    • October 2025
    • September 2025
    • August 2025
    • July 2025
    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • May 2023

    Categories

    • Business
    • Crypto
    • Economy
    • Forex
    • Futures & Commodities
    • Investing
    • Market Data
    • Money
    • News
    • Personal Finance
    • Politics
    • Stocks
    • Technology

    Your source for the serious news. This demo is crafted specifically to exhibit the use of the theme as a news site. Visit our main page for more demos.

    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Buy Now
    © 2026 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.