hive

AI Agents & Assistants

Multi-Agent Harness for Production AI

Track releases GitHub

Python Latest v0.11.0 · 2mo ago Security brief →

Features

Multi‑agent coordination for parallel task execution
Graph‑based DAG generation for complex, recurring workflows
Role‑based persistent memory that evolves with project context
Zero‑setup runtime – no orchestration boilerplate required
Human‑in‑the‑loop control, observability and cost limits

Recent releases

View all 37 releases →

v0.11.0 Breaking risk 2mo

⚠ Upgrade required

Queen's default tool surface changed; update custom profiles referencing removed tools (apply_diff, apply_patch, hashline_edit, grep_search, execute_command_tool, etc.) to use consolidated replacements.
Old queen_lifecycle_tools entries are gone – migrate any external wiring to the new task system.
Task plan is now persistent by default on new sessions; can be collapsed from UI if unwanted.

Breaking changes

Removed tools: apply_diff, apply_patch, hashline_edit, grep_search, data_tools, coder_tools_server, web_scrape, browser_open, execute_command_tool, queen_lifecycle_tools.py.
Queen's default tool surface changed; custom profiles referencing removed tools must be updated.

Notable features

Persistent file‑backed task system for Queen with CRUD, hooks, reminders and UI panel.
Charting capability via new chart_tools MCP server with ECharts/Mermaid inline rendering and OpenHive theme normalization.

Full changelog

🐝 Hive Agent v0.11.0: Action Plans, Charts, and a Cleaner Queen

Major features released in Hive 0.11. Now Queen has an action plan for everything and charting capability to do analytics for you. Overall the conversation and agent experience is also improved a lot thanks to a major Queen prompt and tools refactor.

✨ Highlights

📋 Queen now keeps an action plan for everything

A new file-backed task system gives Queen a persistent, structured plan for every conversation — visible to the user, editable on the fly, and surviving session reload.

File-backed task store under core/framework/tasks/ with full CRUD, scoping, hooks, and reminders. Tasks live on disk so they outlast a single agent run and can be inspected, replayed, or shared between Queen and colony workers.
Multi-task creation in one call — Queen can stage a whole plan up front instead of dripping out one task at a time, then tick items off as it works.
Colony task templates — colonies can publish a template task list that Queen picks up when the colony is invoked, so recurring workflows start with the same plan every time.
Live task list in the UI — a new TaskListPanel renders the plan in real time next to the chat, with item status flowing through the event bus as Queen marks tasks done.
Task reminders + hooks wire into Queen's loop so the plan stays in front of the model and structural blockers preventing tool calls on task_* are now resolved.

📊 Charting capability for analytics

Queen can now produce real charts inline in the conversation, not just describe them.

New chart_tools MCP server with ECharts and Mermaid renderers, an OpenHive theme, and a chart-creation-foundations skill that teaches Queen when to chart vs. when to table.
Inline chart rendering in chat — EChartsBlock and MermaidBlock components render the chart spec directly in the transcript; tool results get a contentful display with ChartToolDetail instead of a JSON dump.
Chart spec normalization in the renderer keeps Y-axis scaling, series colors, and theme tokens consistent regardless of how Queen phrases the spec.

🧹 Major Queen prompt + tools refactor

The biggest cleanup of Queen's tool surface and prompt since v0.7. Fewer, sharper tools; a shorter, more focused prompt; and a clearer model of what Queen has access to vs. what colonies do.

File ops consolidated — apply_diff, apply_patch, hashline_edit, the old data_tools, grep_search, and the legacy coder_tools_server are gone. A single rewritten file_ops module covers read / search / list / edit with a more predictable interface and ~1.7k fewer lines on net.
Search and list-files unified into one toolkit so Queen stops juggling near-duplicate variants.
Browser tools audit — interactions, navigation, tabs, and lifecycle trimmed and consolidated; web_scrape and browser_open merged into a single web-search-and-open path.
New shell/terminal toolkit (shell_tools) — replaces the old execute_command_tool and the inline command sanitizer with a typed module that has proper job control, PTY sessions, ring-buffered output, semantic exit codes, and a destructive-command warning gate. Five new preset skills (shell-tools-foundations, -fs-search, -job-control, -pty-sessions, -troubleshooting) teach Queen the new surface.
Old lifecycle tools removed — queen_lifecycle_tools.py shrunk by ~900 lines as deprecated default tools came out.
Prompt simplification + improvements — Queen's node prompts dropped redundant _queen_style blocks, tightened phrasing, and now lean on the task system for plan-keeping instead of restating the plan every turn.
Tools editor frontend grouping — ToolsEditor.tsx groups tools by category so configuring a queen profile is no longer a flat scroll through 80+ entries.

🆕 What's New

Tasks & Action Plans

core/framework/tasks/ — full task subsystem: store, models, events, hooks, reminders, scoping, plus a tools/ package exposing session and colony task tools to Queen. (@RichardTang-Aden)
POST /api/tasks routes for the frontend to read and mutate the live plan. (@RichardTang-Aden)
TaskListPanel + TaskItem + TaskListContext on the frontend render the plan in real time. (@RichardTang-Aden)
Multi-task creation tool lets Queen stage a whole plan in one call. (@RichardTang-Aden)
Colony task templates — colonies ship with a default task list that Queen adopts on entry. (@RichardTang-Aden)
Hook + reminder fixes so Queen reliably uses task_* tools instead of skipping them. (@RichardTang-Aden)

Charts

tools/src/chart_tools/ — new MCP server with renderer.py, theme.py, tools.py, plus bundled echarts.min.js and mermaid.min.js. (@TimothyZhang7)
chart-creation-foundations skill teaches Queen when and how to chart. (@TimothyZhang7)
EChartsBlock / MermaidBlock / ChartToolDetail components render charts inline. (@TimothyZhang7)
OpenHive chart theme (openhiveTheme.ts) keeps chart styling consistent with the rest of the UI. (@TimothyZhang7)
Chart spec normalization in the renderer fixes Y-axis edge cases and series defaults. (@TimothyZhang7)

Queen Prompt & Tools Refactor

Major file ops refactor — single rewritten file_ops module replaces apply_diff, apply_patch, hashline_edit, grep_search, data_tools, and the legacy coder_tools_server. (@RichardTang-Aden)
Edit-file refactor with a tighter API surface and ~560 lines of dead test_file_ops_hashline.py removed. (@RichardTang-Aden)
Search + list-files consolidation into one toolkit. (@RichardTang-Aden)
Browser tools audit — navigation, interactions, lifecycle, and tabs trimmed; web_scrape and browser-open merged. (@RichardTang-Aden)
shell_tools package replaces execute_command_tool with proper job control, PTY sessions, ring-buffered output, semantic exit codes, and destructive-command warnings. (@TimothyZhang7)
Five new shell preset skills plus reference docs (exit_codes.md, find_predicates.md, ripgrep_cheatsheet.md, signals.md). (@TimothyZhang7)
Old lifecycle tools removed — queen_lifecycle_tools.py lost ~900 lines. (@RichardTang-Aden)
Autocompaction + concurrency tools updated to play nicely with the new tool registry. (@RichardTang-Aden)
Prompt simplification — nodes/__init__.py dropped redundant _queen_style block and tightened phrasing across nodes. (@RichardTang-Aden)
ToolsEditor grouping — frontend tool-config screen now groups tools by category. (@RichardTang-Aden)

Conversation & Agent Experience

ask_user questions surface in the chat transcript instead of vanishing into a side panel, and the question bubble now defers until the user actually answers. (@bryan)
New-session navigation with Queen warm-up UI — new queen-routing.tsx page handles the warm-up so the user sees progress instead of a blank screen. (@bryan)
Sync tool result contentful display — tool results render as structured cards (charts, file diffs, etc.) instead of raw JSON. (@TimothyZhang7)

Vision Fallback

Vision model retry + fallback — non-vision models can now route image inputs through a captioning step instead of failing. (@RichardTang-Aden)
Vision fallback with intent — caption prompts incorporate the user's intent so the caption is task-relevant. (@RichardTang-Aden)
Vision fallback auth — fallback path now uses the right credentials per provider. (@RichardTang-Aden)
Looser max-token cap on vision fallback for models that spend output tokens on internal thinking. (@RichardTang-Aden)
Vision fallback model usage logging for cost visibility. (@RichardTang-Aden)

Colonies

POST /api/colonies/import — onboard a colony from a tar / tar.gz upload. 50 MB cap, manual path-traversal validation (Python 3.11 compatible), symlinks/hardlinks/devices rejected, mode bits masked. Tests cover happy path, name override, replace flag, traversal, absolute paths, and corrupt archives. (@RichardTang-Aden)
Refactored colony routes — routes_colonies.py gained ~450 lines of structure for import/export/list flows. (@TimothyZhang7)

MCP & Tools

SimilarWeb V5 integration — 29 new MCP tools covering traffic & engagement, competitor intelligence, keywords/SERP, audience demographics, and segment analysis. Includes credential spec, health checker, README, and tests on Ubuntu and Windows. (#7066)
MCP registry initialization fix — registry no longer races on first install. (@RichardTang-Aden)

🐛 Bug Fixes

Initial install path resolution — hardcoded HIVE_HOME references replaced; all agent paths now prefixed by the resolved HIVE_HOME. (@RichardTang-Aden)
Frontend recovery after a broken state on session reload. (@RichardTang-Aden)
Compaction issues when the agent loop runs into the buffer mid-stream. (@RichardTang-Aden)
LiteLLM patch for a streaming-usage edge case. (@RichardTang-Aden)
ask_user question bubble now defers until the user answers. (@bryan)
Incubating-mode approval guidance correctly injects into the prompt. (@RichardTang-Aden)
LLM debugger — fixed timeline order and tool-call display. (@RichardTang-Aden)
Shell split-command parsing fix. (@TimothyZhang7)
Chart Y-axis + chart spec normalization edge cases. (@TimothyZhang7)
Scroll behavior on certain element selectors. (@bryan)
CI fixes: skills HIVE_HOME refactor regressions, run_parallel_workers losing task text on spawn, test_capabilities deprecated model identifiers, test_colony_runtime_overseer Windows flake. (#7141, #7149)
Orphan Zoho CRM test directory removed under src/ after the MCP refactor. (#7142)
Credentials — EnvVarStorage.exists now matches load semantics for empty values. (#5680)

🚀 Upgrading from v0.10.5

No migration required. Pull main at v0.11.0 and restart Hive — existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

A few things to know:

Queen's default tool surface changed. If you have a queen profile pinned to a removed tool (e.g. apply_diff, apply_patch, hashline_edit, grep_search, the old execute_command_tool), it'll fall back to the consolidated replacements. Custom profiles referencing those tool names should be updated.
Old queen_lifecycle_tools entries are gone. If you wired any external code against those defaults, switch to the new task system.
Task plan is now persistent. Queen will start staging a plan automatically on new sessions — if you don't want the panel, you can collapse it from the layout.

Plan the work. Chart the result. 🐝

View release on GitHub

v0.10.5 Breaking risk 3mo

Breaking changes

Default DeepSeek model changed to deepseek-v4-pro
Default OpenAI model changed to gpt-5.5

Notable features

Prompt caching across OpenRouter sub-providers
Unified cache-token accounting
Persistent cost tracking in USD

Full changelog

🐝 Hive Agent v0.10.5: Cache-Aware Cost + New Frontier Models

A patch release with two big practical wins: real prompt-cache hits across OpenRouter routes (and the cost numbers to prove it), plus first-class entries for GPT-5.5, DeepSeek V4 Pro/Flash, and GLM-5.1.

✨ Highlights

💸 Huge cost cut from prompt caching

v0.10.4 made the system prompt static so providers could cache it. v0.10.5 actually collects on that work.

cache_control now propagates through OpenRouter for the sub-providers whose upstream APIs honor it: openrouter/anthropic/*, openrouter/google/gemini-*, openrouter/z-ai/glm*, and openrouter/minimax/*. Direct Anthropic / Bedrock / Vertex routes already worked; OpenRouter routes were silently no-op'ing the cache marker before.
Cache-token accounting is unified across providers. A single _extract_cache_tokens helper now reads OpenAI-shape prompt_tokens_details.cached_tokens, Anthropic-raw cache_read_input_tokens, and OpenRouter's normalized cache_write_tokens / cache_creation_input_tokens — and surfaces both cache-read and cache-creation counts (subsets of the input total, never double-counted).
Streaming cache tokens no longer get dropped. LiteLLM's calculate_total_usage aggregates token totals but discards prompt_tokens_details; the stream path now reaches back into the most recent chunk to recover cached/cache-creation counts so the FinishEvent is accurate.
Cost is reported in USD, not just tokens. Every LLMResponse and FinishEvent now carries cost_usd. The extractor consults four sources in priority order: native usage.cost → LiteLLM _hidden_params.response_cost → litellm.completion_cost → curated catalog pricing — so models LiteLLM doesn't price (GLM, Kimi, MiniMax, DeepSeek V4) still get accurate numbers via the catalog fallback.
Persistent cost tracking — the cost number now flows through the event bus to the chat panel and queen DM, and is persisted across sessions instead of resetting on reload.

The combined effect: on a long Claude Sonnet / Opus session routed through OpenRouter, the static system prefix is now a cache hit on every turn after the first, and the panel shows you the dollar savings turn-by-turn.

🧠 New frontier models

GPT-5.5 is now the OpenAI default — frontier coding + reasoning, 128k output / 1.05M context, vision-capable.
DeepSeek V4 Pro and DeepSeek V4 Flash replace deepseek-chat. Both ship with 1M context, 384k max output, and full cache-read pricing (Pro: $1.74 / $3.48 / $0.145 per Mtok; Flash: $0.14 / $0.28 / $0.028). deepseek-reasoner is marked legacy.
GLM-5.1 replaces GLM-5 with cache-read pricing wired in.
Catalog pricing schema — every model can now declare pricing_usd_per_mtok with optional cache_read and cache_creation rates; validated on load.
supports_vision flag added to every model in the catalog and consulted by the new vision-fallback path so non-vision models can still receive image inputs via captioning.

🆕 What's New

Cost & Cache

cache_control for OpenRouter sub-providers — Anthropic, Gemini, GLM, MiniMax routes now mark the static system prefix as ephemeral cache. (@RichardTang-Aden)
_extract_cache_tokens helper — single reader for OpenAI / Anthropic / OpenRouter cache-token shapes; returns (cache_read, cache_creation). (@RichardTang-Aden)
Catalog pricing fallback — _cost_from_catalog_pricing and _cost_from_tokens compute USD from pricing_usd_per_mtok when LiteLLM's catalog has no entry. (@RichardTang-Aden)
Streaming usage recovery — pull cache-token details from the last usage-bearing chunk after calculate_total_usage strips them. (@RichardTang-Aden)
cost_usd, cached_tokens, cache_creation_tokens added to LLMResponse, FinishEvent, and the stream-event bus. (@RichardTang-Aden)
Persistent cost tracking — costs survive session reload and surface in ChatPanel and queen-dm. (@RichardTang-Aden)

Models & Catalog

GPT-5.5 as the new OpenAI default with 1.05M context + native pricing. (@RichardTang-Aden)
DeepSeek V4 Pro / Flash with 1M context, 384k output, and cache-read pricing. (@RichardTang-Aden)
GLM-5.1 replaces GLM-5; cache-read pricing wired. (@RichardTang-Aden)
pricing_usd_per_mtok schema — validated input / output / cache_read / cache_creation per model. (@RichardTang-Aden)
supports_vision flag populated for every catalog entry; queried by the new vision-fallback path. (@RichardTang-Aden)
get_model_pricing / model_supports_vision helpers exposed from framework.llm.model_catalog. (@RichardTang-Aden)

Vision & Agent Loop

Image vision fallback — framework.agent_loop.internals.vision_fallback captions images for non-vision models so the same conversation works regardless of provider capability. (@TimothyZhang7)
Hybrid compaction buffer — context compaction now combines a fixed token reserve with a ratio-of-context buffer instead of one or the other. (@RichardTang-Aden)

Frontend

Configuration UI redesign — refreshed sidebar, prompt library, skills library, and tools editor. (@vincentjiang777)
Cost + token usage in chat — ChatPanel and queen-dm show running token consumption and USD cost per session. (@RichardTang-Aden)

Tests

test_litellm_provider.py (+448 lines) covering cache-token extraction, cost-extraction priority order, OpenRouter compat-mode cache wiring, and streaming usage recovery.
test_model_catalog.py extended for the new pricing schema and supports_vision flag.
test_event_bus.py / test_stream_events.py extended for the new cost + cache fields.

🐛 Bug Fixes

Vision caption — fix incorrect caption attachment in the vision-fallback path. (@TimothyZhang7)
Colony-fork test flake — drain background fork tasks before asserting on colony-spawn artifacts. (@RichardTang-Aden)

🚀 Upgrading from v0.10.4

No migration. Pull main at v0.10.5 and restart Hive — existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

Two things to know:

Default DeepSeek model changed from deepseek-chat to deepseek-v4-pro. If a queen is pinned to deepseek-chat, that id is gone from the catalog — pick deepseek-v4-pro or deepseek-v4-flash.
Default OpenAI model changed from gpt-5.4 to gpt-5.5. gpt-5.4 stays in the catalog as the previous-flagship option.

Cache the prompts. 🐝

View release on GitHub

v0.10.4 Breaking risk 3mo

Breaking changes

Preset skills directory renamed from _default_skills/ to _preset_skills/

Notable features

Skill Library with per-scope allowlist and UI
Tool Library with per-queen and per-colony configuration
MCP Servers management panel

Full changelog

🐝 Hive Agent v0.10.4: Skill & Tool Library

Skills and tools move from something the framework hands down into something you curate. Every queen and every colony now has a dedicated allowlist and a UI to manage it, and the system prompt gets smaller and cache-friendlier along the way.

While we’ve been seeing the queen take on more capable tasks, we also want to give you better visibility into how the queen and the colony achieve it.

Skill Library + Tool Library are here. Every queen and every colony now gets its own tool allowlist and skill set. Browse them, toggle them, upload your own, and author new ones right in the UI. Colonies inherit from their founding queen and then evolve on their own (starting from the skill created by the queen)

Also: the system prompt is now fully static across a session (meaning caching will be used and save you tokens 💰). Date/time has been moved to turn-time injection, so the prompt prefix stops changing and prompt caching actually works. Small change, big win.

🆕 What's New

Skill Library

Skill Library page — browse every skill by scope (queen / colony / framework preset), view SKILL.md inline, toggle per-scope enablement, upload skills as .md or .zip, and author new skills from the UI.
Per-scope overrides — skill enablement is recorded in ~/.hive/agents/queens/{queen_id}/skills_overrides.json and ~/.hive/colonies/{colony_name}/skills_overrides.json; framework presets stay read-only, user-authored skills live under each scope's own skills directory.
Skill provenance — the API and UI now distinguish framework-preset skills, queen-authored skills, and colony-authored skills, so you can tell at a glance who owns a given skill.
Skill authoring primitives — a shared framework.skills.authoring module validates names, parses frontmatter, and materializes skill folders for the UI upload path, the create_colony tool's inline skills, and future runtime-learned skills.
Preset rename — built-in skills moved from _default_skills/ to _preset_skills/ to match the new "preset vs. user" split. Existing browser/linkedin/x automation skills carry over untouched.

Tool Library

Tool Library page with a shared ToolsEditor component used by the queen profile and colony settings panels.
Per-queen tool allowlist at ~/.hive/agents/queens/{queen_id}/tools.json: null = allow all, [] = disable all, ["foo", "bar"] = only these MCP tools pass the filter.
Per-colony tool allowlist at ~/.hive/colonies/{colony_name}/tools.json, with the same schema, atomic writes, and independent lifecycle.
Configurable defaults — queens now carry a default tool/skill bundle that seeds each new colony, and the bundle itself is editable.
Colony inheritance — when a queen spawns a colony, the colony starts from the queen's tool and skill configuration. After spawn the two diverge freely.
Colony sidecar — tools.json lives next to metadata.json so identity/provenance (queen, created_at, workers) and tool gating evolve independently.

MCP Server Management

MCP Servers panel — dedicated settings UI for browsing, configuring, and enabling bundled and user MCP servers.
/api/mcp routes for listing built-in servers, inspecting state, and reporting errors with structured MCP error responses.
Tool catalog wiring — live queen sessions now surface their MCP tool catalog to the queen-tools and colony-tools endpoints, so the UI shows exactly what the running session can see.

Prompt & Runtime

Static system prompt — the agent loop, conversation, and provider adapters (LiteLLM, Antigravity, Codex, Mock) now build and freeze the system prompt once per session. Per-turn values that used to churn the prompt are gone.
Date/time injected at turn time — today's date and current time move from the system prompt into a turn-level injection path that updates cursor persistence and queen-lifecycle tooling.
Queen orchestrator — refreshed to pair with the static prompt model and the new tool/skill configuration layers.
Session manager — tightened session-creation input validation and reflection/skill edge handling; "create new session and switch branch" is now reliable.

🐛 Bug Fixes

No-cache middleware on /api/* — every API response now carries Cache-Control: no-store. Without this, a one-off bad response (e.g. the SPA catch-all leaking index.html for an /api/* URL before a route was registered) could get pinned in the browser's disk cache and replayed forever, since our JSON handlers don't emit ETag/Last-Modified. Hard-refresh no longer required to recover.
Tools & skills registration — queens and colonies no longer end up with stale or duplicated entries after reloads.
Session creation — invalid inputs are rejected up front with clear errors instead of surfacing later as runtime failures.
Skill / reflection edges — tightened handling so reflection runs no longer see half-built skill state during scope reloads.
Create new session + switch branch flow works end-to-end without orphaning sessions.
CI — broken workflow repaired.

🧪 Tests

test_routes_skills.py, test_skill_overrides.py, test_colony_tools.py, test_queen_tools.py, test_mcp_routes.py — coverage added for every new route group and the override store.

🚀 Upgrading from v0.10.3

No migration. Pull main at v0.10.4 and restart Hive — existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

Two things to know:

Preset skills directory renamed from _default_skills/ to _preset_skills/ inside the framework. If you had external scripts pointing at that path, update them. User-authored skills under ~/.hive/ are unaffected.
First open of a queen or colony writes a tools.json sidecar the first time you edit its allowlist. If you don't touch the Tool Library, nothing is written and behavior matches v0.10.3 (allow all MCP tools).

Curate your queens. 🐝

View release on GitHub

v0.10.3 Breaking risk 3mo

Breaking changes

Worker and table tabs now scoped to colony (previously global)

Notable features

Queen DMs: messages sent during processing queue with Steer/Cancel controls
Colonies: context compaction during spawn with automatic incubating phase and scoped tools
Queen DMs: native ask_user tool for user prompts

Full changelog

🐝 Hive Agent v0.10.3

Colonies grow up, and Queen DMs learn to listen.

v0.10.0 introduced colonies. v0.10.3 is the release where they stop feeling like a new concept bolted on and start feeling like the place you actually work. Alongside that, Queen DMs got the single biggest fix to single-agent chat since we shipped it: you can keep typing while the queen is thinking, and she'll hear you.

The Colony, grown up

When you spawn a colony now, a few things happen that didn't before.

The queen who spawned it hands off cleanly — her session is compacted first, so the new colony doesn't inherit a bloated context and spend its first ten turns figuring out what it already knows. There's a short incubating phase between "spawn requested" and "colony live" where skills, storage, and scheduler tools get set up quietly in the background. By the time the colony is ready, it has its own scoped skill bundle and SQLite — no more cross-colony skill leakage, no more workers belonging to the wrong group.

The UI finally matches the model. The sidebar groups everything by colony with a DataGrid view, shows the active queen on a dedicated bar inside the colony, and lets you click a worker to open it as its own tab. Tables and workers are scoped to the colony you're looking at, which sounds obvious in hindsight and was a long-standing source of confusion. Queen identity — name, title, avatar — now travels with the queen into message bubbles, the profile pane, and the org chart, so it's consistent no matter where you see her.

If you were using colonies in v0.10.1 or v0.10.2, this release is the one where the experience stops fighting you.

Queen DMs stop eating your keystrokes

The most common complaint about Queen DMs was simple: if the queen was mid-turn and you thought of something to add, your message either got lost or arrived at a weird moment. That's gone.

Messages you send while the queen is working now land in a pending queue, visible in the chat panel with a Steer or Cancel control. Steer folds your message into the turn in progress; Cancel drops it. When the queue auto-flushes, the "typing…" indicator no longer flickers, and the old bootstrap race that sometimes rendered your own message twice is fixed.

The queen also got a proper ask_user tool this release, so when she genuinely needs something from you, it shows up as a question — not as a regular chat message you have to parse as one. Tool calls in chat are grouped by session now, so a chatty worker doesn't drown out the queen's own thinking, and her avatar is on every bubble so you can tell who's talking at a glance.

Smaller things worth knowing

Prometheus tool for querying metrics from agents (#7047).
Scheduler + triggers got a UI pass, better reliability on trigger message delivery, and scheduler tools are now available during the incubating phase.
VSCode extension bumped to 1.0.1 with refreshed icons and a fix for frame-resize jank.
Model catalog updates for Xiaomi and OpenRouter selections.
Runtime reliability: cancelled executions now fully terminate before a session can restart (#7001), Codex store=False is honored correctly (#7089), and the UI handles a broken Aden API key gracefully instead of hanging.

Upgrading from v0.10.2

No migration. Pull main at v0.10.3 and restart Hive — your existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

One thing to be aware of: worker and table tabs are now scoped per colony. If you expected them to be global, switch colonies in the sidebar to see each colony's own.

View release on GitHub

v0.10.2 Breaking risk 3mo

Breaking changes

browser_click_coordinate, browser_hover_coordinate, browser_press_at, and rect-returning tools now use viewport fractions (0..1) instead of pixel coordinates; direct callers must convert using cssWidth/cssHeight

Notable features

Queen runtimes survive profile and queen switches without teardown
Browser tab groups now isolated per profile to prevent state bleed
Remote browser debugger UI added for visual debugging and testing

Full changelog

🐝 Hive Agent v0.10.2

A browser-automation-focused follow-up to v0.10.1. Coordinates that flow between the vision model and Chrome are now fractions of the viewport instead of screenshot pixels — so the same (x, y) works across Claude, GPT-4o, Gemini, and any other VLM regardless of how each one resizes or tiles the image. Plus reliability fixes for queen switching, tab-group isolation, and CI.

✨ Highlights

Model-invariant visual clicks. Every coordinate-taking browser tool (browser_click_coordinate, browser_hover_coordinate, browser_press_at) and every rect-returning tool (browser_get_rect, browser_shadow_query, the rect inside focused_element) now speaks in 0..1 fractions of the viewport. Vision-model pixel resizing no longer silently breaks clicks when you swap backends.
Queens survive profile/queen switches. Switching queens no longer tears down the active queen's runtime.
Tab-group isolation. Browser tab groups are now namespaced per profile, so stale highlight / attach state can't bleed across profiles when Chrome reuses a tab id.
Remote browser debugger. New scripts/browser_remote.py + HTML UI give a visual debugging surface for the Chrome extension bridge — live screenshots, coord inspector, and one-click test harness for the GCU tools.
Greener CI. All framework/tools test failures resolved and Windows CI is unbroken; full ruff lint + format pass across the codebase.
Gemini reliability tuning. gemini-3-flash-customtools and gemini-3.1-pro-preview-customtools now run with max_context_tokens: 240000 (down from 900k) — long-context quality on Gemini degrades well before the advertised window, and clamping lower trades headroom for more predictable tool use.

🆕 What's New

Browser automation

Fraction-based coordinates — all click / hover / press / rect tools now use (0..1, 0..1) fractions of the viewport. Internally each tool multiplies by the cached cssWidth / cssHeight before dispatching to CDP. Four-decimal precision (0.0001 ≈ 0.17 CSS px on a 1717-wide viewport) is sufficient for the tightest targets. (@timothyadenhq)
browser_type_focused — dedicated focused-element typing tool split out from browser_type. Use after browser_click_coordinate focuses the target; faster than browser_press for multi-character input. (@RichardTang-Aden)
Multi-mode screenshot tool — browser_screenshot gained viewport / full-page / selector-clip modes and returns cssWidth / cssHeight in metadata so callers can reason about viewport size if they need to. (@RichardTang-Aden)
Dashed highlighter for type-focus events — visual differentiation between click (solid) and type-focus (dashed) highlights on post-interaction screenshots. (@RichardTang-Aden)
Default 1 ms key delay + prompt tuning — browser_type now uses a 1 ms delay by default (was higher), matching what real rich-text editors expect; related orchestrator prompt improvements. (@RichardTang-Aden)
Remote browser debugger UI — scripts/browser_remote.py + scripts/browser_remote_ui.html provide a live visual surface to exercise the GCU browser bridge (screenshots, click targeting, coord readout). (@RichardTang-Aden)
Iframe-aware focused_element — same-origin iframe descent (capped at 5 levels), so focused_element reports the real inner element instead of {tag: "iframe"}. Adds an inFrame: [...] breadcrumb when traversed. (@timothyadenhq)

Skills & prompts

Browser / LinkedIn automation skills rewritten around the new fraction convention — "read proportion off the image" workflow, updated rect examples, updated troubleshooting entries. (@timothyadenhq, @RichardTang-Aden)
GCP skills and prompt improvements — polish on the browser-edge-cases skill and the queen GCU reference guide. (@RichardTang-Aden)
Canonical workflow simplified — slimmer, less prescriptive guidance in the default browser/linkedin skills. (@timothyadenhq)

Core / server

Namespaced browser tab groups — per-profile tab_group tracking in queen_orchestrator / session_manager, with a clear_tab_highlights(tab_ids) cleanup hook called on context destruction so stale highlight / attach state can't leak onto reused tab ids. (@timothyadenhq)
Don't kill the queen on switch — queen switching no longer invokes the "stop runtime" path, keeping active sessions alive across UI navigation. (@timothyadenhq)

LLM & model catalog

Gemini context window clamped to 240k — core/framework/llm/model_catalog.json drops max_context_tokens from 900000 → 240000 on both gemini-3-flash-customtools (Fast) and gemini-3.1-pro-preview-customtools (Best quality). Reduces the chance of context-window-edge failures on long sessions. (@RichardTang-Aden)

Developer experience

Codebase-wide ruff clean — 155 lint errors (70 auto-fixed + 85 manual) resolved across framework and tools; 343 files reformatted. Long-line, missing-import, duplicate-method, and W291 whitespace issues all cleared. (#7058)
Framework + tools test suite green — 52 → 0 failures across framework tests (mock LLM model attribute, updated skill/prompt assertions, compaction formatting, model catalog) and tools tests (csv_tool paths, browser_evaluate toast wrapper). (#7059)
Windows CI unbroken — background-job test uses sys.executable + double quotes, CLI entry-point guards against None stdout, safe-eval timeout bumped for slower Windows runners. (#7061)

🐛 Bug Fixes

Fraction-click tab-state leak (_screenshot_css_scales NameError) — clear_tab_state raised NameError on every tab close and profile teardown because a removed cache was still referenced. Fixed in tools/src/gcu/browser/tools/inspection.py.
Missing highlight cleanup on profile destroy — introduced clear_tab_highlights so orphaned highlight state doesn't reappear when Chrome reuses a tab id on a later profile.
Queen session shutdown on switch — switching between queens no longer terminates the active queen's runtime.
Pruned tool-result sentinel mismatch — compaction / conversation now accept both Pruned tool result ... and [Pruned tool result ...] sentinel shapes.
Mock LLM infinite loop on exhausted scenarios — MockStreamingLLM and _ByTaskMockLLM now emit a clean text-stop when scenarios are consumed, unblocking test_worker_report.

⬆️ Upgrading from v0.10.1

No migration steps for stored state — existing ~/.hive/ profiles, queens, and sessions continue to work.

Behavior change for direct callers of browser coord tools: browser_click_coordinate, browser_hover_coordinate, browser_press_at, and rect-returning tools now expect and return fractions of the viewport (0..1 on each axis) instead of screenshot pixels. Agents using the default browser-automation skill get this automatically — the skill was updated alongside the tool change. Only custom code that hardcoded pixel coordinates against the prior 800 px-wide screenshot space needs adjustment: divide by cssWidth / cssHeight (now exposed in browser_screenshot metadata) to convert.

Pull main at the v0.10.2 tag and restart Hive.

View release on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.