← Back to archive

Signals for 28 May 2026

Published 2026-05-28T08:15+02:00

Vier strikte signalen over agentic enterprise evals, AI-stack product-market fit, coding-agent waarderingen en high-risk agent workflows.

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Hugging Face Blog

Artificial Analysis and IBM published ITBench-AA, showing frontier models still score below 50% on agentic enterprise IT tasks. Relevant voor Bart omdat enterprise agents task design, evals and operational boundaries need before they can be treated as production workflows.

#agent #agentic-workflows #evals #implementation

I think Anthropic and OpenAI have found product-market fit

Simon Willison

Simon Willison argues that OpenAI and Anthropic have found product-market fit as enterprise usage and developer-agent spending become materially visible. Relevant voor Bart omdat de vraag verschuift van hype naar where durable value and pricing power remain in the AI stack.

#agent #builder #implementation #market-strategy

AI coding agent Devin maker Cognition more than doubles its valuation to $26 billion in under nine months

The Decoder

Cognition, the company behind Devin, raised over $1 billion at a valuation above $26 billion. Relevant voor Bart omdat investor conviction around coding agents is rising faster than consensus on real-world value, making implementation evidence and workflow fit more important.

#agent #agentic-workflows #builder #evals

Robinhood lets AI agents trade shares and make credit card purchases for customers

The Decoder

Robinhood now lets customers connect AI agents like Claude to a separate investment account via MCP, where agents can trade stocks autonomously. Relevant voor Bart omdat agentic systems are moving into high-consequence actions, making permissions, auditability and human approval paths product requirements.

#agent #agentic-workflows #builder