UiPath Screen Agent Powered by Claude Opus 4.5 Receives Top Ranking from OSWorld

UiPath announced its UiPath Screen Agent powered by Claude Opus 4.5 achieved a No. 1 ranking on the OSWorld-Verified benchmark, an independent evaluation conducted by the OSWorld research group to validate the effectiveness of computer-use agents for enterprise-wide agentic AI deployments.

Agent and agentic benchmarks validate AI’s effectiveness in real use cases and task environments, giving enterprises the confidence to deploy AI across multiple workflows. The OSWorld benchmark provides a unified, integrated computer environment for assessing open-ended computer tasks that involve arbitrary applications. It uses a first-of-its-kind scalable, real computer environment for multimodal agents, providing validation across 369 computer tasks involving web and desktop apps in open domains, OS file IO, and workflows spanning multiple operating systems and applications.

A core technology for UiPath ScreenPlay, UiPath Screen Agent uses common large language models (LLMs) that allow for the use of natural language to simply and easily create user interfaces (UI) to automate and execute end-to-end complex tasks. The ranking of UiPath Screen Agent powered by Claude Opus 4.5 validates its effectiveness, weighing its performance against both general-purpose and specialized computer-using models, as well as other agentic frameworks evaluated in the benchmark.

“Having had an early look at UiPath ScreenPlay, we’re excited about its potential to meaningfully improve how we scale automation. Its adaptive intelligence could support our growing partner ecosystem while helping reduce ongoing maintenance so our teams can stay focused on growth,” said Noble Keyser, Manager of Enterprise AI and Automation, SimpleTire.

This milestone builds on UiPath’s continued progress in advancing UI automation with agentic AI, following the September 2025 OSWorld ranking of UiPath Screen Agent powered by OpenAI GPT-5 at No. 2 on the same benchmark.

“Organizations need the confidence that their large-scale commitments to AI will pay off, which is where benchmarks can be incredibly helpful in validating specific use cases and critical workflows,” said Mircea Neagovici-Negoescu, Senior Vice President of AI and Research, UiPath. “Investing in AI and agents at enterprise speed and scale can be daunting. This ranking underscores UiPath ongoing investment in this area and its commitment to empowering customers with enterprise-grade computer-use capabilities.”