Signals for 17 May 2026

Vandaag draait het om agents als ontworpen workflows, betere evaluatie van agentgedrag, de toolinglaag rond coding agents en modelarchitectuur als kosten- en controlekeuze.

For $1.3 million a month, OpenClaw founder Peter Steinberger runs 100 AI agents that code, review PRs, and find bugs

The Decoder

For $1.3 million a month, OpenClaw founder Peter Steinberger runs 100 AI agents that code, review PRs, and find bugs. Dit is relevant omdat agentwaarde steeds meer in workflowontwerp en taakafbakening zit, niet alleen in een slimmer model.

#agent #agentic-workflows #builder #evals

APWA: A Distributed Architecture for Parallelizable Agentic Workflows

arXiv reasoning / agents / evals

APWA: A Distributed Architecture for Parallelizable Agentic Workflows. Dit is relevant omdat serieuze AI-implementatie valt of staat met evaluatie, betrouwbaarheid en begrip van nieuwe failure modes.

#agent #evals #research-evals #systems-framing

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

The Decoder

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously. Dit is relevant omdat agentwaarde steeds meer in workflowontwerp en taakafbakening zit, niet alleen in een slimmer model.

#agent #agentic-workflows #evals

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

arXiv reasoning / agents / evals

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search. Dit is relevant omdat serieuze AI-implementatie valt of staat met evaluatie, betrouwbaarheid en begrip van nieuwe failure modes.

#agent #evals #implementation #research-evals

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

arXiv reasoning / agents / evals

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory. Dit is relevant omdat serieuze AI-implementatie valt of staat met evaluatie, betrouwbaarheid en begrip van nieuwe failure modes.

#agent #evals #research-evals

Warelay -> OpenClaw

Simon Willison

Warelay -> OpenClaw. Dit is relevant omdat de builderlaag rond AI concreter wordt: tools, runtimes en ontwikkelworkflows bepalen steeds vaker de echte hefboom.

#tooling-runtime

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

Interconnects

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment. Dit is relevant omdat serieuze AI-implementatie valt of staat met evaluatie, betrouwbaarheid en begrip van nieuwe failure modes.

#research-evals

Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other’s credibility. Now the jury will pick a side.

MIT Technology Review AI

Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other’s credibility. Now the jury will pick a side. Dit is relevant omdat het laat zien waar duurzame waarde in de AI-stack kan blijven hangen na de hype.

#market-strategy

OpenAI Lets Developers Control Codex Coding Agent from ChatGPT Mobile App - Technobezz

Google News AI Lab Watch

OpenAI Lets Developers Control Codex Coding Agent from ChatGPT Mobile App - Technobezz. Dit is relevant omdat de builderlaag rond AI concreter wordt: tools, runtimes en ontwikkelworkflows bepalen steeds vaker de echte hefboom.

#agent #builder #tooling-runtime

Researchers train AI model that hits near-full performance with just 12.5 percent of its experts

The Decoder

Researchers train AI model that hits near-full performance with just 12.5 percent of its experts. Dit is relevant omdat modelkeuze steeds meer een architectuurvraag wordt rond kosten, context, latency en controle.

#evals #models-architecture