Skip to content

Codeep

v2.0.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai ai-agent ai-agents ai-tools cli-app

Summary

AI summary

Anthropic prompt caching is now enabled by default and a /plan command previews an agent's execution plan before running it.

Changes in this release

Feature Medium

Anthropic prompt caching enabled by default, reducing costs 60–90% for cache-eligible inputs.

Anthropic prompt caching enabled by default, reducing costs 60–90% for cache-eligible inputs.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: high

Feature Medium

`/plan <task>` command previews agent's full plan without file changes or tool calls.

`/plan <task>` command previews agent's full plan without file changes or tool calls.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: high

Feature Medium

/go command executes the pending plan, applying all MCP tools and hooks unchanged.

/go command executes the pending plan, applying all MCP tools and hooks unchanged.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: high

Feature Medium

Plan mode (`/plan` and `/go`) available in TUI and ACP clients (Zed, VS Code).

Plan mode (`/plan` and `/go`) available in TUI and ACP clients (Zed, VS Code).

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Feature Medium

TokenUsage now includes `cacheCreationTokens` and `cacheReadTokens` fields.

TokenUsage now includes `cacheCreationTokens` and `cacheReadTokens` fields.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Feature Medium

`getCacheStats()` aggregates per-session cache hits, misses, and estimated USD savings.

`getCacheStats()` aggregates per-session cache hits, misses, and estimated USD savings.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Feature Medium

Two cache breakpoints per request: system prompt and tools array; cache hits bill at 0.1× input rate, writes at 1.25×.

Two cache breakpoints per request: system prompt and tools array; cache hits bill at 0.1× input rate, writes at 1.25×.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Medium

Caching skips silently for inputs below 1024 tokens with no error path.

Caching skips silently for inputs below 1024 tokens with no error path.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Medium

Prompt caching propagates through OpenRouter → Anthropic routes, honoring upstream caching headers.

Prompt caching propagates through OpenRouter → Anthropic routes, honoring upstream caching headers.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Low

/cost and /stats display a "Prompt caching" section when cached calls occur.

/cost and /stats display a "Prompt caching" section when cached calls occur.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Medium

Fixed streaming usage extraction for Anthropic to include cache fields, preventing token undercounting.

Fixed streaming usage extraction for Anthropic to include cache fields, preventing token undercounting.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: high

Full changelog

Two big quality-of-life additions: Anthropic prompt caching is on by default (60–90% cheaper on cache-eligible input), and /plan lets you preview an agent's full plan before any file gets touched. Run /go to execute, or /plan <revised task> to refine.

Added — Anthropic prompt caching, automatic

  • Two cache breakpoints per request: the system prompt (and embedded
    skills catalog / project intelligence) and the tools array. Cache hits
    bill at 0.1× the input rate; cache writes at 1.25×. Net win after the
    second same-shape request, which is every iteration in an agent loop.
    Below 1024 input tokens Anthropic silently skips caching — no error
    path. Applies to the agent chat path, the agent fallback path, and
    the chat() path used by /agent and inline replies. Also propagates
    through OpenRouter → Anthropic routes (caching headers honoured
    upstream).
  • TokenUsage.cacheCreationTokens + cacheReadTokens fields
    surfaced on every record. getCacheStats() aggregates per-session
    cache hits, misses, and estimated USD savings vs running without
    caching. /cost (and /stats) renders a new "Prompt caching"
    section when at least one cached call landed.

Added — Plan mode (/plan + /go)

  • /plan <task> — generates a numbered plan for the task (no tool
    calls, no file changes), surfaces it as a Markdown message so you can
    review what the agent would do, which files it would touch, what
    commands it would run, and the risk level it self-assesses. Holds
    the (task, plan) pair as the pending plan, scoped to the current
    process. Re-running /plan <revised task> replaces the pending plan
    with a new one (you pay one extra LLM call but get readable revision
    history in the chat).
  • /go — executes the pending plan: hands the task + approved plan
    as a single prompt to the regular agent loop, so all MCP tools,
    lifecycle hooks, verification, permissions, and skill bundles apply
    unchanged. Includes an explicit anti-improvisation clause in the
    injected prompt — if any step turns out to be wrong mid-execution
    the agent must stop and report rather than silently rewriting the
    plan.
  • Available in both the TUI and ACP clients (Zed, VS Code). ACP
    /plan streams the plan back via session/update; ACP /go runs
    the agent inline and streams iterations through onChunk.
  • Surfaced in /help ("Agent Mode" section) and / autocomplete.

Fixed

  • Anthropic streaming usage extraction missed cache fields. Both
    the agent stream handler (utils/agentStream.ts) and the chat
    stream handler (api/index.ts) now pick up
    cache_creation_input_tokens and cache_read_input_tokens from the
    message_start event, so cached requests no longer undercount
    prompt tokens or display $0 savings.

Notes

  • OpenAI-format providers (OpenAI direct, Z.AI, DeepSeek, MiniMax,
    Ollama) don't expose explicit cache markers — those providers
    generally apply automatic prefix caching server-side. No code change
    on our end needed; cost reports stay accurate via standard
    prompt_tokens accounting.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Codeep

Get notified when new releases ship.

Sign up free

About Codeep

All releases →

Related context

Earlier breaking changes

  • v2.4.1 MiniMax M3 replaces MiniMax-M2.7 as default model across all providers.
  • v2.0.0 McpServer protocol now optional fields `command`, `args`, plus new `url` and `headers`; version bumped to 2.0.0.

Beta — feedback welcome: [email protected]