Remote Labor Index finds AI automates under 3% of real freelance projects
Researchers using a benchmark called the Remote Labor Index (RLI) tested several AI models on real remote freelance projects and found the systems completed only a tiny fraction of work at an acceptable quality level, with the best model reaching an automation rate of just 2.5%. The tasks had been originally completed by human freelancers and spanned game development, product design, architecture, data analysis and video animation.
Examples included building an interactive dashboard for World Happiness Report data, creating 3D animations for earbuds, producing a 2D animated advertising video, developing architectural plans and a 3D container-home model, designing a themed version of a merging game, and formatting a paper for an IEEE conference.
The human-produced versions of these projects cost $10,000 and took more than 100 hours to complete. The study tested Manus, Grok 4, Sonnet 4.5, GPT-5, a ChatGPT agent and Gemini 2.5 Pro. The researchers said, "While AI systems have saturated many existing benchmarks, we find that state-of-the-art AI agents perform near the floor on RLI." Manus achieved 2.5%, Grok 4 and Sonnet 4.5 tied at 2.1%, GPT-5 scored 1.7%, the ChatGPT agent 1.3% and Gemini 2.5 Pro 0.8%.
One researcher, Dan Hendrycks, said via a post on X that although AIs are smart they are "not yet that useful," noting an overall automation rate of less than 3% and pointing to deficiencies such as lack of long-term memory and limited visual abilities.
Key Topics
Tech, Remote Labor Index, Ai Agents, Manus, Dan Hendrycks, World Happiness Report