Skip to content

agent-zero

v1.17 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent ai assistant autonomous linux zero

Affected surfaces

auth rbac

ReleasePort's take

Light signal
editorial:auto 11d

The `computer_use_remote` tool now offers cross‑platform desktop control with mandatory visual verification and returns screenshots as multimodal vision messages.

Why it matters: All state‑changing desktop actions require a fresh screenshot for verification, ensuring reliable automation across macOS, Windows, and Linux platforms.

Summary

AI summary

computer_use_remote now provides full cross‑platform desktop control with mandatory visual verification and multimodal screenshot output.

Changes in this release

Feature Medium

Exposes `computer_use_remote` as a callable tool in live sessions.

Exposes `computer_use_remote` as a callable tool in live sessions.

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Requires visual verification via fresh screenshot for all state‑changing desktop actions.

Requires visual verification via fresh screenshot for all state‑changing desktop actions.

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Returns screenshots as multimodal vision messages instead of text summaries.

Returns screenshots as multimodal vision messages instead of text summaries.

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Separates host desktop control (`computer_use_remote`) from Xpra Docker environment.

Separates host desktop control (`computer_use_remote`) from Xpra Docker environment.

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Adds platform‑specific structural targeting skills for macOS (AX), Windows (UIA), and Linux (AT‑SPI/Wayland).

Adds platform‑specific structural targeting skills for macOS (AX), Windows (UIA), and Linux (AT‑SPI/Wayland).

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Updates window‑hide guidance to prefer `Super+H` over `Alt+F9` on Ubuntu/GNOME/Wayland.

Updates window‑hide guidance to prefer `Super+H` over `Alt+F9` on Ubuntu/GNOME/Wayland.

Source: llm_adapter@2026-05-23

Confidence: low

Feature Medium

Prunes older capture payloads to prevent runaway context growth.

Prunes older capture payloads to prevent runaway context growth.

Source: llm_adapter@2026-05-23

Confidence: low

Bugfix Medium

Preserves vision inputs in Codex OAuth proxy by converting image parts to `input_image` API fields.

Preserves vision inputs in Codex OAuth proxy by converting image parts to `input_image` API fields.

Source: llm_adapter@2026-05-23

Confidence: high

Bugfix Medium

Handles macOS approval denial by mapping `COMPUTER_USE_APPROVAL_REQUIRED` to re‑arm‑required stop flow.

Handles macOS approval denial by mapping `COMPUTER_USE_APPROVAL_REQUIRED` to re‑arm‑required stop flow.

Source: llm_adapter@2026-05-23

Confidence: high

Bugfix Medium

Sanitizes base64 image data URLs from token estimates to prevent screenshot‑induced token inflation.

Sanitizes base64 image data URLs from token estimates to prevent screenshot‑induced token inflation.

Source: llm_adapter@2026-05-23

Confidence: high

Full changelog

Computer Use Remote: Full Desktop Control Pipeline

This release builds out the computer_use_remote tool into a complete, cross-platform host desktop control system with visual verification, multimodal capture handling, and platform-specific structural targeting.

Highlights

  • computer_use_remote exposed as a callable tool — The model can now invoke computer_use_remote directly in live sessions. Availability, trust mode, and re-arm enforcement remain runtime checks rather than prompt-loader gates.

  • Visual verification required for all desktop actions — State-changing desktop actions are treated as unverified attempts until a fresh screenshot visibly confirms the outcome. Agents must stop and not proceed when a screenshot is unavailable.

  • Screenshots attached as multimodal tool results — Computer-use captures are returned as real multimodal vision messages (not just text summaries), so the model can visually inspect the screen after each action. Older capture payloads are pruned to prevent runaway context growth.

  • Host desktop cleanly separated from Xpra desktopcomputer_use_remote is now the sole host desktop-control path; linux-desktop targets only the internal Docker/Xpra environment. Host-screen queries rank ahead of the Xpra skill while explicit "Agent Zero Desktop" requests still route correctly.

  • Platform-specific structural targeting skills:

    • macOS — Dedicated skill for Accessibility (AX) structural targeting with ax_snapshot and ax_action support, loaded only when the backend reports macOS capabilities.
    • Windows — UIA-based skill with window-management guidance, selector passthrough, and click-last workflow hints.
    • Linux — AT-SPI/Wayland skill with compact structural tree outlines in snapshot responses for semantic target selection.
    • Backend-specific action details are kept out of the generic prompt; the generic layer handles only backend discovery and skill loading.
  • Codex OAuth proxy preserves vision inputs — Image content parts are now correctly converted to Responses API input_image parts instead of being flattened to text, with regression coverage for multimodal tool results containing screenshots.

  • macOS approval denial handled gracefullyCOMPUTER_USE_APPROVAL_REQUIRED responses map to the existing re-arm-required stop flow, preventing agents from retrying or falling back to screenshots when permissions haven't been granted.

  • Prompt token accounting fixed for screenshots — Embedded base64 image data URLs are sanitized from token estimates so screenshot attachments no longer inflate context budgets.

  • Window-hide guidance updated — Ubuntu/GNOME/Wayland sessions now prefer Super+H over Alt+F9, with a reminder that keystroke results only prove the keys were sent, not that the action succeeded.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track agent-zero

Get notified when new releases ship.

Sign up free

About agent-zero

Agent Zero AI framework

All releases →

Related context

Earlier breaking changes

  • v1.16 Legacy speech settings and APIs removed; use _kokoro_tts and _whisper_stt plugins instead.
  • v1.14 Multi-action tools standardized around tool_args.action with backward compatibility
  • v1.14 A0 connector remote workflow split into separate text-editor and code-execution skills
  • v1.14 Office skills renamed to task-oriented names: Writer, Calc, Impress

Beta — feedback welcome: [email protected]