Codeep

v2.0.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai ai-agent ai-agents ai-tools cli-app

Summary

AI summary

Anthropic prompt caching is now enabled by default and a /plan command previews an agent's execution plan before running it.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Anthropic prompt caching enabled by default, reducing costs 60–90% for cache-eligible inputs. Anthropic prompt caching enabled by default, reducing costs 60–90% for cache-eligible inputs. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	`/plan <task>` command previews agent's full plan without file changes or tool calls. `/plan <task>` command previews agent's full plan without file changes or tool calls. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	/go command executes the pending plan, applying all MCP tools and hooks unchanged. /go command executes the pending plan, applying all MCP tools and hooks unchanged. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Plan mode (`/plan` and `/go`) available in TUI and ACP clients (Zed, VS Code). Plan mode (`/plan` and `/go`) available in TUI and ACP clients (Zed, VS Code). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	TokenUsage now includes `cacheCreationTokens` and `cacheReadTokens` fields. TokenUsage now includes `cacheCreationTokens` and `cacheReadTokens` fields. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	`getCacheStats()` aggregates per-session cache hits, misses, and estimated USD savings. `getCacheStats()` aggregates per-session cache hits, misses, and estimated USD savings. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Two cache breakpoints per request: system prompt and tools array; cache hits bill at 0.1× input rate, writes at 1.25×. Two cache breakpoints per request: system prompt and tools array; cache hits bill at 0.1× input rate, writes at 1.25×. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Medium	Caching skips silently for inputs below 1024 tokens with no error path. Caching skips silently for inputs below 1024 tokens with no error path. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Medium	Prompt caching propagates through OpenRouter → Anthropic routes, honoring upstream caching headers. Prompt caching propagates through OpenRouter → Anthropic routes, honoring upstream caching headers. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	/cost and /stats display a "Prompt caching" section when cached calls occur. /cost and /stats display a "Prompt caching" section when cached calls occur. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Medium	Fixed streaming usage extraction for Anthropic to include cache fields, preventing token undercounting. Fixed streaming usage extraction for Anthropic to include cache fields, preventing token undercounting. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—

Full changelog

Two big quality-of-life additions: Anthropic prompt caching is on by default (60–90% cheaper on cache-eligible input), and /plan lets you preview an agent's full plan before any file gets touched. Run /go to execute, or /plan <revised task> to refine.

Added — Anthropic prompt caching, automatic

Two cache breakpoints per request: the system prompt (and embedded
skills catalog / project intelligence) and the tools array. Cache hits
bill at 0.1× the input rate; cache writes at 1.25×. Net win after the
second same-shape request, which is every iteration in an agent loop.
Below 1024 input tokens Anthropic silently skips caching — no error
path. Applies to the agent chat path, the agent fallback path, and
the chat() path used by /agent and inline replies. Also propagates
through OpenRouter → Anthropic routes (caching headers honoured
upstream).
TokenUsage.cacheCreationTokens + cacheReadTokens fields
surfaced on every record. getCacheStats() aggregates per-session
cache hits, misses, and estimated USD savings vs running without
caching. /cost (and /stats) renders a new "Prompt caching"
section when at least one cached call landed.

Added — Plan mode (`/plan` + `/go`)

/plan <task> — generates a numbered plan for the task (no tool
calls, no file changes), surfaces it as a Markdown message so you can
review what the agent would do, which files it would touch, what
commands it would run, and the risk level it self-assesses. Holds
the (task, plan) pair as the pending plan, scoped to the current
process. Re-running /plan <revised task> replaces the pending plan
with a new one (you pay one extra LLM call but get readable revision
history in the chat).
/go — executes the pending plan: hands the task + approved plan
as a single prompt to the regular agent loop, so all MCP tools,
lifecycle hooks, verification, permissions, and skill bundles apply
unchanged. Includes an explicit anti-improvisation clause in the
injected prompt — if any step turns out to be wrong mid-execution
the agent must stop and report rather than silently rewriting the
plan.
Available in both the TUI and ACP clients (Zed, VS Code). ACP
/plan streams the plan back via session/update; ACP /go runs
the agent inline and streams iterations through onChunk.
Surfaced in /help ("Agent Mode" section) and / autocomplete.

Fixed

Anthropic streaming usage extraction missed cache fields. Both
the agent stream handler (utils/agentStream.ts) and the chat
stream handler (api/index.ts) now pick up
cache_creation_input_tokens and cache_read_input_tokens from the
message_start event, so cached requests no longer undercount
prompt tokens or display $0 savings.

Notes

OpenAI-format providers (OpenAI direct, Z.AI, DeepSeek, MiniMax,
Ollama) don't expose explicit cache markers — those providers
generally apply automatic prefix caching server-side. No code change
on our end needed; cost reports stay accurate via standard
prompt_tokens accounting.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Codeep

Get notified when new releases ship.

About Codeep

All releases →

Related context

Related tools

Earlier breaking changes

v2.8.0 `codeep account push` and `account sync` no longer transfer API keys unless `/keysync on` is enabled
v2.4.1 MiniMax M3 replaces MiniMax-M2.7 as default model across all providers.
v2.0.0 McpServer protocol now optional fields `command`, `args`, plus new `url` and `headers`; version bumped to 2.0.0.