agent-zero

v1.17 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent ai assistant autonomous linux zero

Affected surfaces

auth rbac

ReleasePort's take

Light signal

editorial:auto 2mo

The `computer_use_remote` tool now offers cross‑platform desktop control with mandatory visual verification and returns screenshots as multimodal vision messages.

Why it matters: All state‑changing desktop actions require a fresh screenshot for verification, ensuring reliable automation across macOS, Windows, and Linux platforms.

Summary

AI summary

computer_use_remote now provides full cross‑platform desktop control with mandatory visual verification and multimodal screenshot output.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Exposes `computer_use_remote` as a callable tool in live sessions. Exposes `computer_use_remote` as a callable tool in live sessions. Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Requires visual verification via fresh screenshot for all state‑changing desktop actions. Requires visual verification via fresh screenshot for all state‑changing desktop actions. Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Returns screenshots as multimodal vision messages instead of text summaries. Returns screenshots as multimodal vision messages instead of text summaries. Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Separates host desktop control (`computer_use_remote`) from Xpra Docker environment. Separates host desktop control (`computer_use_remote`) from Xpra Docker environment. Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Adds platform‑specific structural targeting skills for macOS (AX), Windows (UIA), and Linux (AT‑SPI/Wayland). Adds platform‑specific structural targeting skills for macOS (AX), Windows (UIA), and Linux (AT‑SPI/Wayland). Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Updates window‑hide guidance to prefer `Super+H` over `Alt+F9` on Ubuntu/GNOME/Wayland. Updates window‑hide guidance to prefer `Super+H` over `Alt+F9` on Ubuntu/GNOME/Wayland. Source: llm_adapter@2026-05-23 Confidence: low	—
Feature	Medium	Prunes older capture payloads to prevent runaway context growth. Prunes older capture payloads to prevent runaway context growth. Source: llm_adapter@2026-05-23 Confidence: low	—
Bugfix
Bugfix	Medium	Preserves vision inputs in Codex OAuth proxy by converting image parts to `input_image` API fields. Preserves vision inputs in Codex OAuth proxy by converting image parts to `input_image` API fields. Source: llm_adapter@2026-05-23 Confidence: high	—
Bugfix	Medium	Handles macOS approval denial by mapping `COMPUTER_USE_APPROVAL_REQUIRED` to re‑arm‑required stop flow. Handles macOS approval denial by mapping `COMPUTER_USE_APPROVAL_REQUIRED` to re‑arm‑required stop flow. Source: llm_adapter@2026-05-23 Confidence: high	—
Bugfix	Medium	Sanitizes base64 image data URLs from token estimates to prevent screenshot‑induced token inflation. Sanitizes base64 image data URLs from token estimates to prevent screenshot‑induced token inflation. Source: llm_adapter@2026-05-23 Confidence: high	—

Full changelog

Computer Use Remote: Full Desktop Control Pipeline

This release builds out the computer_use_remote tool into a complete, cross-platform host desktop control system with visual verification, multimodal capture handling, and platform-specific structural targeting.

Highlights

computer_use_remote exposed as a callable tool — The model can now invoke computer_use_remote directly in live sessions. Availability, trust mode, and re-arm enforcement remain runtime checks rather than prompt-loader gates.
Visual verification required for all desktop actions — State-changing desktop actions are treated as unverified attempts until a fresh screenshot visibly confirms the outcome. Agents must stop and not proceed when a screenshot is unavailable.
Screenshots attached as multimodal tool results — Computer-use captures are returned as real multimodal vision messages (not just text summaries), so the model can visually inspect the screen after each action. Older capture payloads are pruned to prevent runaway context growth.
Host desktop cleanly separated from Xpra desktop — computer_use_remote is now the sole host desktop-control path; linux-desktop targets only the internal Docker/Xpra environment. Host-screen queries rank ahead of the Xpra skill while explicit "Agent Zero Desktop" requests still route correctly.
Platform-specific structural targeting skills:
- macOS — Dedicated skill for Accessibility (AX) structural targeting with ax_snapshot and ax_action support, loaded only when the backend reports macOS capabilities.
- Windows — UIA-based skill with window-management guidance, selector passthrough, and click-last workflow hints.
- Linux — AT-SPI/Wayland skill with compact structural tree outlines in snapshot responses for semantic target selection.
- Backend-specific action details are kept out of the generic prompt; the generic layer handles only backend discovery and skill loading.
Codex OAuth proxy preserves vision inputs — Image content parts are now correctly converted to Responses API input_image parts instead of being flattened to text, with regression coverage for multimodal tool results containing screenshots.
macOS approval denial handled gracefully — COMPUTER_USE_APPROVAL_REQUIRED responses map to the existing re-arm-required stop flow, preventing agents from retrying or falling back to screenshots when permissions haven't been granted.
Prompt token accounting fixed for screenshots — Embedded base64 image data URLs are sanitized from token estimates so screenshot attachments no longer inflate context budgets.
Window-hide guidance updated — Ubuntu/GNOME/Wayland sessions now prefer Super+H over Alt+F9, with a reminder that keystroke results only prove the keys were sent, not that the action succeeded.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track agent-zero

Get notified when new releases ship.

About agent-zero

Agent Zero AI framework

All releases →

Related context

Related tools

Earlier breaking changes

v2.5 Legacy filesystem logs in /a0/logs are deleted on startup.
v2.5 Launcher host access controls removed from Core WebUI.
v2.5 Model presets become global; project-scoped definitions removed.
v1.16 Legacy speech settings and APIs removed; use _kokoro_tts and _whisper_stt plugins instead.
v1.14 Multi-action tools standardized around tool_args.action with backward compatibility