Forge

v0.7.1 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-workflow agents function-calling llama-cpp llamafile

+5 more

llm ollama python self-hosted tool-calling

Affected surfaces

auth rbac

ReleasePort's take

Light signal

editorial:auto 2mo

Version v0.7.1 of Forge adds Anthropic Messages API support and real token‑usage reporting.

Why it matters: Forge v0.7.1 introduces the `POST /v1/messages` endpoint for Anthropic, enabling unified token usage across OpenAI and Anthropic responses—critical for cost tracking in mixed‑provider workloads.

Summary

AI summary

Added Anthropic Messages API support, per-request model pass‑through, and real token usage reporting to the Forge proxy.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Adds Anthropic Messages API endpoint (`POST /v1/messages`) to proxy. Adds Anthropic Messages API endpoint (`POST /v1/messages`) to proxy. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds `--mode {native,prompt}` flag to proxy for prompt-injected function calling. Adds `--mode {native,prompt}` flag to proxy for prompt-injected function calling. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds real token‑usage reporting for both OpenAI and Anthropic response shapes. Adds real token‑usage reporting for both OpenAI and Anthropic response shapes. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds per‑request model‑name pass‑through for external OpenAI‑compatible backends. Adds per‑request model‑name pass‑through for external OpenAI‑compatible backends. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds Dockerfile for running the proxy as a container. Adds Dockerfile for running the proxy as a container. Source: llm_adapter@2026-05-24 Confidence: low	—
Bugfix
Bugfix	Medium	Lazily imports optional `anthropic` SDK; plain install can start proxy without `[anthropic]` extra. Lazily imports optional `anthropic` SDK; plain install can start proxy without `[anthropic]` extra. Source: llm_adapter@2026-05-24 Confidence: high	—
Bugfix	Medium	Proxy router now tolerates query strings in requests. Proxy router now tolerates query strings in requests. Source: llm_adapter@2026-05-24 Confidence: high	—
Bugfix	Medium	Corrects token accounting in `eval_runner` for local backends using unified `last_usage`. Corrects token accounting in `eval_runner` for local backends using unified `last_usage`. Source: llm_adapter@2026-05-24 Confidence: high	—
Refactor	Medium	Unifies `last_usage` to slot‑keyed `{slot_id: TokenUsage}` across all clients. Unifies `last_usage` to slot‑keyed `{slot_id: TokenUsage}` across all clients. Source: llm_adapter@2026-05-24 Confidence: high	—
Refactor	Medium	Moves inbound `model` to proxy's passthrough/extras channel instead of sampling map. Moves inbound `model` to proxy's passthrough/extras channel instead of sampling map. Source: llm_adapter@2026-05-24 Confidence: high	—

Full changelog

[0.7.1] — 2026-05-24

Proxy hardening: forge now works with Claude Code. First PyPI release to include the Docker, model-pass-through, and token-usage work that landed on main after v0.7.0.

Added

Anthropic Messages API on the proxy (POST /v1/messages). Point Claude Code — or any Anthropic-protocol client — at a forge-guarded model. Two downstream shapes: Path 2 (default, --backend-protocol openai) translates Anthropic ↔ OpenAI for local llama.cpp / Ollama and emits Anthropic SSE back; Path 1 (--backend-protocol anthropic, external mode) forwards to an Anthropic-shape downstream (LiteLLM, the Anthropic API, a self-hosted proxy), passing unknown fields through verbatim. Adds a base_url kwarg on AnthropicClient. See the new "Using forge with Claude Code" section in the User Guide.
--mode {native,prompt} proxy flag — run prompt-injected function-calling through the proxy for OpenAI-compatible backends that lack a native tool-calling template, not just native FC. Closes #53.
Real token-usage reporting through the proxy — responses carry actual prompt/completion counts (previously hardcoded zeros), in both OpenAI (usage.prompt_tokens/...) and Anthropic (usage.input_tokens/output_tokens) shapes, streaming and non-streaming. #81 (thanks @mhajder).
Per-request model-name pass-through for external backends — the proxy honors the inbound model against external OpenAI-compatible backends. #80 (thanks @mhajder).
Dockerfile for running the proxy as a container. #79 (thanks @mhajder).

Changed

last_usage unified on slot-keyed {slot_id: TokenUsage} across all clients. AnthropicClient previously stored a flat {input_tokens, output_tokens} dict; it now uses the slot-0 convention LlamafileClient / OllamaClient already follow, so usage extraction has one contract.
Inbound model rides the proxy's passthrough/extras channel rather than the sampling map — a cleaner replacement for the #80 mechanism that keeps model out of sampling.

Fixed

Proxy no longer hard-imports the optional anthropic SDK at load. A plain forge-guardrails install (without the [anthropic] extra) can now start the proxy for local / OpenAI-shape backends; the SDK is imported lazily and only required for --backend-protocol anthropic.
Proxy router tolerates query strings. Requests like Claude Code's POST /v1/messages?beta=true route correctly instead of returning 404.
eval_runner token accounting for local backends — was silently counting zero tokens because it read the flat last_usage keys; now reads the slot-keyed TokenUsage (fixed by the unification above).

Known limitations

cache_control is not preserved on Path 2. OpenAI Chat Completions has no analog, so prompt-cache hints are dropped when the downstream is a local OpenAI-shape backend. Path 1 (Anthropic-shape downstream) preserves cache_control on clean turns. See ADR-015.
Prompt-mode multi-turn tool convergence is model-dependent. Some models reliably consume prompt-injected tool results across turns; others re-call the same tool. Native FC is the more robust default for heavy multi-turn tool use (e.g. Claude Code).

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Forge

Get notified when new releases ship.

About Forge

All releases →

Related context

Related tools

Earlier breaking changes

v0.7.5 Changes default behavior to replay no reasoning blocks.
v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
v0.7.3 Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.
v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].