This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+5 more
Affected surfaces
ReleasePort's take
Light signalVersion v0.7.1 of Forge adds Anthropic Messages API support and real token‑usage reporting.
Why it matters: Forge v0.7.1 introduces the `POST /v1/messages` endpoint for Anthropic, enabling unified token usage across OpenAI and Anthropic responses—critical for cost tracking in mixed‑provider workloads.
Summary
AI summaryAdded Anthropic Messages API support, per-request model pass‑through, and real token usage reporting to the Forge proxy.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds Anthropic Messages API endpoint (`POST /v1/messages`) to proxy. Adds Anthropic Messages API endpoint (`POST /v1/messages`) to proxy. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds `--mode {native,prompt}` flag to proxy for prompt-injected function calling. Adds `--mode {native,prompt}` flag to proxy for prompt-injected function calling. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds real token‑usage reporting for both OpenAI and Anthropic response shapes. Adds real token‑usage reporting for both OpenAI and Anthropic response shapes. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds per‑request model‑name pass‑through for external OpenAI‑compatible backends. Adds per‑request model‑name pass‑through for external OpenAI‑compatible backends. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds Dockerfile for running the proxy as a container. Adds Dockerfile for running the proxy as a container. Source: llm_adapter@2026-05-24 Confidence: low |
— |
| Bugfix | Medium |
Lazily imports optional `anthropic` SDK; plain install can start proxy without `[anthropic]` extra. Lazily imports optional `anthropic` SDK; plain install can start proxy without `[anthropic]` extra. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Bugfix | Medium |
Proxy router now tolerates query strings in requests. Proxy router now tolerates query strings in requests. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Bugfix | Medium |
Corrects token accounting in `eval_runner` for local backends using unified `last_usage`. Corrects token accounting in `eval_runner` for local backends using unified `last_usage`. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Refactor | Medium |
Unifies `last_usage` to slot‑keyed `{slot_id: TokenUsage}` across all clients. Unifies `last_usage` to slot‑keyed `{slot_id: TokenUsage}` across all clients. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Refactor | Medium |
Moves inbound `model` to proxy's passthrough/extras channel instead of sampling map. Moves inbound `model` to proxy's passthrough/extras channel instead of sampling map. Source: llm_adapter@2026-05-24 Confidence: high |
— |
Full changelog
[0.7.1] — 2026-05-24
Proxy hardening: forge now works with Claude Code. First PyPI release to include the Docker, model-pass-through, and token-usage work that landed on main after v0.7.0.
Added
- Anthropic Messages API on the proxy (
POST /v1/messages). Point Claude Code — or any Anthropic-protocol client — at a forge-guarded model. Two downstream shapes: Path 2 (default,--backend-protocol openai) translates Anthropic ↔ OpenAI for local llama.cpp / Ollama and emits Anthropic SSE back; Path 1 (--backend-protocol anthropic, external mode) forwards to an Anthropic-shape downstream (LiteLLM, the Anthropic API, a self-hosted proxy), passing unknown fields through verbatim. Adds abase_urlkwarg onAnthropicClient. See the new "Using forge with Claude Code" section in the User Guide. --mode {native,prompt}proxy flag — run prompt-injected function-calling through the proxy for OpenAI-compatible backends that lack a native tool-calling template, not just native FC. Closes #53.- Real token-usage reporting through the proxy — responses carry actual prompt/completion counts (previously hardcoded zeros), in both OpenAI (
usage.prompt_tokens/...) and Anthropic (usage.input_tokens/output_tokens) shapes, streaming and non-streaming. #81 (thanks @mhajder). - Per-request model-name pass-through for external backends — the proxy honors the inbound
modelagainst external OpenAI-compatible backends. #80 (thanks @mhajder). - Dockerfile for running the proxy as a container. #79 (thanks @mhajder).
Changed
last_usageunified on slot-keyed{slot_id: TokenUsage}across all clients.AnthropicClientpreviously stored a flat{input_tokens, output_tokens}dict; it now uses the slot-0 conventionLlamafileClient/OllamaClientalready follow, so usage extraction has one contract.- Inbound
modelrides the proxy's passthrough/extras channel rather than the sampling map — a cleaner replacement for the #80 mechanism that keepsmodelout ofsampling.
Fixed
- Proxy no longer hard-imports the optional
anthropicSDK at load. A plainforge-guardrailsinstall (without the[anthropic]extra) can now start the proxy for local / OpenAI-shape backends; the SDK is imported lazily and only required for--backend-protocol anthropic. - Proxy router tolerates query strings. Requests like Claude Code's
POST /v1/messages?beta=trueroute correctly instead of returning 404. eval_runnertoken accounting for local backends — was silently counting zero tokens because it read the flatlast_usagekeys; now reads the slot-keyedTokenUsage(fixed by the unification above).
Known limitations
cache_controlis not preserved on Path 2. OpenAI Chat Completions has no analog, so prompt-cache hints are dropped when the downstream is a local OpenAI-shape backend. Path 1 (Anthropic-shape downstream) preservescache_controlon clean turns. See ADR-015.- Prompt-mode multi-turn tool convergence is model-dependent. Some models reliably consume prompt-injected tool results across turns; others re-call the same tool. Native FC is the more robust default for heavy multi-turn tool use (e.g. Claude Code).
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Forge
All releases →Related context
Related tools
Earlier breaking changes
- v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
- v0.7.3 Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.
- v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
- v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].
Beta — feedback welcome: [email protected]