Artificial IntelligenceAnthropicClaude API

Claude Computer Use: Let Claude Control a Desktop

TT
TopicTrick
Claude Computer Use: Let Claude Control a Desktop

Most AI automation works through APIs. You write code that calls a service, parses a response, and produces an outcome. But many systems do not have APIs. Legacy enterprise software, government portals, desktop applications, internal tools built decades ago — these systems exist entirely in the graphical world of windows, buttons, menus, and forms.

Claude's computer use capability is designed for exactly this problem. It gives Claude the ability to see a screen, move a cursor, click buttons, type into fields, and interact with any application or website as a human user would. You describe a task in plain language. Claude figures out how to accomplish it by looking at the screen and taking actions.

This post gives you an honest, practical picture of what computer use can do, how it actually works, where it is genuinely useful, and what you need to know before building with it.


What Computer Use Actually Is

Computer use is Claude operating as a virtual user of a graphical environment. Claude receives screenshots of the current screen state as image inputs. It decides what action to take — move the mouse, click, scroll, type text, or press keyboard shortcuts — and returns those actions as tool calls. Your infrastructure executes the actions on an actual computer, takes a new screenshot, and sends it back to Claude. This loop continues until the task is complete.

The three core tools Claude uses in computer use mode are:

  • computer tool: Takes screenshots, moves the mouse, clicks, types, presses keyboard shortcuts
  • text_editor tool: Creates and edits text files on the system
  • bash tool: Executes shell commands on the system

Together these give Claude the ability to operate essentially any software that a human can operate.


Available Models for Computer Use

Computer use is supported on:

  • claude-opus-4-6: Best overall reasoning for complex multi-step tasks, recommended for production computer use
  • claude-sonnet-4-6: Balanced performance and cost for simpler, more direct tasks

The correct API model string for Opus 4.6 computer use is claude-opus-4-6. Computer use is not available on Haiku models.


How to Set Up Computer Use

Running computer use requires a sandboxed computer environment that Claude can interact with. The recommended approach is to use a Docker container with a virtual desktop.

Anthropic provides reference Docker images to get started. Here is the pattern:

bash
1# Pull the Anthropic reference computer use image 2docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest 3 4# Run with API key 5docker run \ 6 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \ 7 -v $HOME/.anthropic:/home/user/.anthropic \ 8 -p 5900:5900 -p 6080:6080 -p 8501:8501 \ 9 -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

This starts a Ubuntu desktop environment accessible through a browser at localhost:6080/vnc.html and an agent interface at localhost:8501.

Always Run Computer Use in a Sandbox

Never run computer use agents with access to your actual computer, files, accounts, or credentials. Computer use is explicitly designed for sandboxed, isolated environments. Claude is given broad system access — files, network, applications — and mistakes by the model or malicious content in the environment could have significant consequences if it is not fully isolated. Use a dedicated Docker container or virtual machine with no access to production systems.


    The Computer Use API Flow

    For custom implementations, here is the core structure of a computer use interaction:

    python
    1import anthropic 2 3client = anthropic.Anthropic() 4 5# Define the computer use tools 6tools = [ 7 { 8 "type": "computer_20250124", 9 "name": "computer", 10 "display_width_px": 1024, 11 "display_height_px": 768, 12 "display_number": 1 13 }, 14 { 15 "type": "text_editor_20250124", 16 "name": "str_replace_based_edit_tool" 17 }, 18 { 19 "type": "bash_20250124", 20 "name": "bash" 21 } 22] 23 24messages = [ 25 { 26 "role": "user", 27 "content": "Open a browser, navigate to anthropic.com, and take a screenshot of the homepage." 28 } 29] 30 31response = client.messages.create( 32 model="claude-opus-4-6", 33 max_tokens=4096, 34 tools=tools, 35 messages=messages 36) 37 38# Process tool calls in the same loop pattern as standard tool use 39# Your infrastructure must execute click/type/screenshot actions

    The loop is identical to standard tool use — handle tool_use blocks, your code executes the action on the real computer environment, return screenshots as tool_result blocks, Claude continues.


    Real-World Use Cases for Computer Use

    Despite being experimental, computer use is genuinely useful for a specific set of scenarios:

    • Legacy system automation: Interacting with old enterprise applications that have no API — government portals, legacy ERPs, insurance systems
    • Web scraping when no API exists: Navigating websites with complex authentication flows, pagination, or JavaScript-heavy content that defeats standard scrapers
    • Software testing: End-to-end UI testing of web and desktop applications — Claude can navigate as a user would and check for unexpected behaviour
    • Repetitive desktop tasks: Form filling, report generation, copying data between desktop applications
    • Development workflows: Running terminal commands, editing configuration files, setting up development environments

    Current Limitations — Be Realistic

    Computer use is explicitly labelled experimental by Anthropic. This is an honest label. You should understand these constraints before building production systems on it:

    • Accuracy: Claude may click in the wrong location, misread text in screenshots, or take unintended actions, especially with dense or unusual UIs
    • Latency: Each screenshot-decide-act cycle takes several seconds. Multi-step tasks that require dozens of actions can run for minutes
    • Cost: Vision tokens for repeated screenshots are expensive at scale. Complex tasks can consume significant API budget
    • CAPTCHA and anti-bot measures: Computer use does not defeat bot detection systems, as this would be misuse of the capability
    • Complex UIs: Highly interactive web applications with overlapping elements, custom UI components, or unusual layouts may cause confusion

    Combine Computer Use with APIs Where Possible

    The most effective architectures do not use computer use for everything. Use standard API calls where services have APIs, and reserve computer use for the specific steps that genuinely require visual interaction. A hybrid agent might use the GitHub API for most operations but switch to computer use when it needs to interact with a web-only console that has no API access.


      Safety and Human Oversight

      Because computer use gives Claude broad system access, Anthropic recommends several safety practices:

      • Minimal permissions: Give the sandboxed environment only the access it needs for the specific task
      • Confirmation steps: For destructive actions — deleting files, submitting forms, making purchases — require human confirmation before Claude proceeds
      • Session isolation: Create a fresh container per task session to prevent cross-task contamination
      • Audit logging: Log all actions Claude takes for review, especially in higher-stakes environments
      • Explicit scope: Tell Claude exactly what it is allowed to do and what is off-limits in your system prompt

      Include Explicit Stopping Conditions

      In your system prompt for computer use agents, always include explicit conditions for when Claude should stop and ask for human input rather than proceeding. Something like: 'If you encounter a confirmation dialog for a payment or deletion, stop and report to the user instead of confirming.' Without these stopping conditions, Claude will attempt to complete the task regardless of what it encounters.


        Summary

        Computer use is a genuine capability breakthrough — the ability to automate any graphical software without requiring an API or custom integration. It is also experimental, slow, costly, and requires careful sandboxing to use safely.

        The right mindset: treat computer use as the tool of last resort for automation — reach for it when there is no API, no structured data source, and no simpler alternative. When those conditions are met, it is remarkably capable.

        Next in Module 4: Building with the Claude Files API: Upload Once, Use Many Times.


        This post is part of the Anthropic AI Tutorial Series. Previous post: Claude Vision: Analyse Images, PDFs, and Documents.