AI Agents (Still) Suck

Scale Labs just updated their Remote Labor Index (RLI) – a measure of how well AI agents are actually able to do work in the real world (“Evaluating the capability of AI agents to perform real-world, economically valuable remote work”). The tl;dr:

Absolute Automation is Near Zero: Current agents perform near the floor. At the time this leaderboard was launched, the highest-performing agent (Manus) achieved a 2.5% automation rate, with other models performing worse. This indicates systems fail to complete the vast majority of projects to a professional, client-ready standard.

The upshot? They are getting better.

↗ Link

Pascal Finette @radical