AI’s Limits Exposed: New Study Finds Machines Struggle With Real Remote Work
Research shows AI agents fail most remote tasks, with top performer automating just 2.5% of freelance work.
The study, called the Remote Labor Index (RLI), represents one of the most detailed attempts so far to measure AI’s performance on practical digital work.
It focuses on tasks that mirror real online freelancing jobs rather than theoretical tests or benchmark problems.
- Researchers collected 240 completed projects from professional freelancers working through platforms such as Upwork.
- Each project included the original brief, all input materials, and the final deliverable that a client had accepted.
- These projects came from 23 categories of work, including product design, animation, architecture, game development, and data analysis.
- Together they covered more than 6,000 hours of paid labor valued at about $140,000.
Six advanced AI agents were then tested on the same projects.
- Manus
- Grok 4
- Sonnet 4.5
- GPT-5
- ChatGPT agent
- Gemini 2.5 Pro
Author summary: AI agents struggle with real remote work tasks.
more
Digital Information World — 2025-11-01