Microsoft: Why not let our Copilot fly your computer?

featured-image

Redmond talks up preview of AI agents navigating apps through the UI Microsoft will soon let Copilot agents drive computers through the GUI just like humans – by clicking buttons, selecting menus, and even completing forms on screen....

Microsoft will soon let Copilot agents drive computers through the GUI just like humans – by clicking buttons, selecting menus, and even completing forms on screen. On Wednesday, the Windows empire said it plans to enable computer use from within Copilot Studio - Microsoft's platform for building and deploying AI agents. This will spare employees from having to click buttons and fill forms themselves, while still keeping enterprise data corralled inside Microsoft's cloud - Redmond insists none of it is used to train its models.

"Computer use enables agents to interact with websites and desktop apps by clicking buttons, selecting menus, and typing into fields on the screen," explained Charles Lamanna, corporate VP for business and industry, Copilot, in the corp's marketing bumf . "This allows agents to handle tasks even when there is no API available to connect to the system directly. If a person can use the app, the agent can too.



" AI agents are, as far as we can tell, pieces of software that talk to other pieces of software as well as users, using generative AI to make decisions and form outputs. Today, Microsoft Copilot Studio enables customers to create AI-driven agents to automate certain tasks, but these agents only work with specific services, like SharePoint. The new type of agents should be much more flexible.

For instance, you could create an agent and prompt it to carry out a series of steps that involve browsing a previously unseen website, extracting some data, and passing that data to a desktop app. Lamanna suggests several scenarios where the new Copilot agents could come in handy, such as automating the input of large amounts of data from multiple sources to a central repository, automatically collecting market data for research, or using AI text and image recognition capabilities to process invoices. Microsoft is not the only AI provider trying to make agents more useful to normal people.

OpenAI on Wednesday launched a new set of AI models dubbed o3 and o4-mini, which it claims are its "smartest" models to date. What distinguishes them is their ability to independently use and combine various tools within ChatGPT to solve complex, multi-step tasks. "For the first time, our reasoning models can agentically use and combine every tool within ChatGPT — this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images," the AI giant said .

As an example, OpenAI suggests the prompt, "How will summer energy usage in California compare to last year?" In response, o3 will initiate a web search for public utility data, write Python code to create an energy forecast, and then generate a graph or image with an explanation of the prediction. In other words, it's capable of taking multiple steps that involve different systems without needing manual coordination or external integration layers. o4-mini is a smaller model optimized for fast, cost-efficient reasoning, OpenAI says, claiming it tops the benchmarks on AIME 2024 and 2025.

Separately, OpenAI also launched OpenAI Codex CLI , a terminal-based coding agent. "Codex CLI is built for developers who already live in the terminal and want ChatGPT‐level reasoning plus the power to actually run code, manipulate files, and iterate – all under version control," the tool's GitHub repo explains. AI automation differs from programmed instructions in that the agent can adapt on the fly when it encounters obstacles or unexpected changes in the interface.

Instead of crashing with an error, it uses built-in reasoning to muddle through, at least according to Microsoft. "Computer use adapts to changes in apps and websites automatically," Lamanna claimed. "It adjusts in real time using built-in reasoning to fix issues on its own, so work continues without interruption.

" With any luck, said reasoning does not involve unexpected deletions or policy violations, as one concerned user fretted about in a a social media thread solicited by a Copilot Studio product manager. However, turning over computational tasks to Copilot may involve unanticipated costs . As with cloud services, the bill for AI's boil-the-ocean approach to computation use isn't necessarily easy to anticipate and there's potential for bill shock if certain tasks turn out to be computationally demanding.

Concerns about costs have been raised by users of OpenAI's computer use API , and by users of Anthropic's computer use API. Microsoft is bringing computer use to Copilot Studio users through an early access research preview that requires a signup . Expect to hear more about this at Microsoft Build 2025 next month.

®.