This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
Summary
AI summaryAnthropic prompt caching is now enabled by default and a /plan command previews an agent's execution plan before running it.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Anthropic prompt caching enabled by default, reducing costs 60–90% for cache-eligible inputs. Anthropic prompt caching enabled by default, reducing costs 60–90% for cache-eligible inputs. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
`/plan <task>` command previews agent's full plan without file changes or tool calls. `/plan <task>` command previews agent's full plan without file changes or tool calls. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
/go command executes the pending plan, applying all MCP tools and hooks unchanged. /go command executes the pending plan, applying all MCP tools and hooks unchanged. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Plan mode (`/plan` and `/go`) available in TUI and ACP clients (Zed, VS Code). Plan mode (`/plan` and `/go`) available in TUI and ACP clients (Zed, VS Code). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
TokenUsage now includes `cacheCreationTokens` and `cacheReadTokens` fields. TokenUsage now includes `cacheCreationTokens` and `cacheReadTokens` fields. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
`getCacheStats()` aggregates per-session cache hits, misses, and estimated USD savings. `getCacheStats()` aggregates per-session cache hits, misses, and estimated USD savings. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Two cache breakpoints per request: system prompt and tools array; cache hits bill at 0.1× input rate, writes at 1.25×. Two cache breakpoints per request: system prompt and tools array; cache hits bill at 0.1× input rate, writes at 1.25×. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Medium |
Caching skips silently for inputs below 1024 tokens with no error path. Caching skips silently for inputs below 1024 tokens with no error path. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Medium |
Prompt caching propagates through OpenRouter → Anthropic routes, honoring upstream caching headers. Prompt caching propagates through OpenRouter → Anthropic routes, honoring upstream caching headers. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
/cost and /stats display a "Prompt caching" section when cached calls occur. /cost and /stats display a "Prompt caching" section when cached calls occur. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Medium |
Fixed streaming usage extraction for Anthropic to include cache fields, preventing token undercounting. Fixed streaming usage extraction for Anthropic to include cache fields, preventing token undercounting. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
Full changelog
Two big quality-of-life additions: Anthropic prompt caching is on by default (60–90% cheaper on cache-eligible input), and
/planlets you preview an agent's full plan before any file gets touched. Run/goto execute, or/plan <revised task>to refine.
Added — Anthropic prompt caching, automatic
- Two cache breakpoints per request: the system prompt (and embedded
skills catalog / project intelligence) and the tools array. Cache hits
bill at 0.1× the input rate; cache writes at 1.25×. Net win after the
second same-shape request, which is every iteration in an agent loop.
Below 1024 input tokens Anthropic silently skips caching — no error
path. Applies to the agent chat path, the agent fallback path, and
the chat() path used by/agentand inline replies. Also propagates
through OpenRouter → Anthropic routes (caching headers honoured
upstream). TokenUsage.cacheCreationTokens+cacheReadTokensfields
surfaced on every record.getCacheStats()aggregates per-session
cache hits, misses, and estimated USD savings vs running without
caching./cost(and/stats) renders a new "Prompt caching"
section when at least one cached call landed.
Added — Plan mode (/plan + /go)
/plan <task>— generates a numbered plan for the task (no tool
calls, no file changes), surfaces it as a Markdown message so you can
review what the agent would do, which files it would touch, what
commands it would run, and the risk level it self-assesses. Holds
the (task, plan) pair as the pending plan, scoped to the current
process. Re-running/plan <revised task>replaces the pending plan
with a new one (you pay one extra LLM call but get readable revision
history in the chat)./go— executes the pending plan: hands the task + approved plan
as a single prompt to the regular agent loop, so all MCP tools,
lifecycle hooks, verification, permissions, and skill bundles apply
unchanged. Includes an explicit anti-improvisation clause in the
injected prompt — if any step turns out to be wrong mid-execution
the agent must stop and report rather than silently rewriting the
plan.- Available in both the TUI and ACP clients (Zed, VS Code). ACP
/planstreams the plan back viasession/update; ACP/goruns
the agent inline and streams iterations through onChunk. - Surfaced in
/help("Agent Mode" section) and/autocomplete.
Fixed
- Anthropic streaming usage extraction missed cache fields. Both
the agent stream handler (utils/agentStream.ts) and the chat
stream handler (api/index.ts) now pick up
cache_creation_input_tokensandcache_read_input_tokensfrom the
message_startevent, so cached requests no longer undercount
prompt tokens or display $0 savings.
Notes
- OpenAI-format providers (OpenAI direct, Z.AI, DeepSeek, MiniMax,
Ollama) don't expose explicit cache markers — those providers
generally apply automatic prefix caching server-side. No code change
on our end needed; cost reports stay accurate via standard
prompt_tokensaccounting.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Codeep
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]