Forge

v0.7.3 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 1mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-workflow agents function-calling llama-cpp llamafile

+5 more

llm ollama python self-hosted tool-calling

ReleasePort's take

Moderate signal

editorial:auto 1mo

The `--mode` CLI flag has been renamed to `--backend-capability`; native function calling now defaults to transparent passthrough.

Why it matters: All users of the proxy CLI must update scripts and configurations by version v0.7.3, or commands will fail due to the removed `--mode` option.

Summary

AI summary

BREAKING: --mode flag renamed to --backend-capability; native function calling becomes default transparent passthrough.

Changes in this release

Type	Severity	Summary	CVE
Breaking	High	Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias. Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias. Source: llm_adapter@2026-06-01 Confidence: high	—
Feature
Feature	Medium	Adds `OpenAICompatClient` for arbitrary OpenAI‑compatible endpoints. Adds `OpenAICompatClient` for arbitrary OpenAI‑compatible endpoints. Source: llm_adapter@2026-06-01 Confidence: high	—
Feature	Medium	Adds `--backend-timeout` proxy option (default 300 s). Adds `--backend-timeout` proxy option (default 300 s). Source: llm_adapter@2026-06-01 Confidence: high	—
Feature	Medium	Adds `--backend-capability {native,prompt}` proxy flag (default native). Adds `--backend-capability {native,prompt}` proxy flag (default native). Source: llm_adapter@2026-06-01 Confidence: high	—
Deprecation	Low	Removes runtime `auto` function‑calling mode in `LlamafileClient`. Removes runtime `auto` function‑calling mode in `LlamafileClient`. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix
Bugfix	Medium	Fixes inconsistent handling of malformed tool calls and unexpected responses across OpenAI‑shape clients. Fixes inconsistent handling of malformed tool calls and unexpected responses across OpenAI‑shape clients. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Medium	Fixes `Guardrails.record()` from dropping tool arguments for prerequisite tracking. Fixes `Guardrails.record()` from dropping tool arguments for prerequisite tracking. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Medium	Adds proxy input hardening, non‑blocking Ollama stop, client shutdown handling, and loud argument decode errors. Adds proxy input hardening, non‑blocking Ollama stop, client shutdown handling, and loud argument decode errors. Source: llm_adapter@2026-06-01 Confidence: high	—
Refactor	Low	Replaces deprecated asyncio API and adds proxy server input validation. Replaces deprecated asyncio API and adds proxy server input validation. Source: llm_adapter@2026-06-01 Confidence: high	—
Refactor	Low	Cleans up dead code and a fragile variable reference in `LlamafileClient`. Cleans up dead code and a fragile variable reference in `LlamafileClient`. Source: llm_adapter@2026-06-01 Confidence: high	—

Full changelog

[0.7.3] — 2026-06-01

Native-first proxy. With native function calling now well-supported across modern local models, the proxy defaults to — and is optimized for — native tool calling, forwarding the client's OpenAI tools / messages to the backend verbatim. Prompt-injection remains available as an explicit opt-in for llama.cpp / llamafile backends that lack a function-calling template, but it is no longer the default path. This release also folds in the OpenAI-compatible client and several proxy / eval fixes that landed on main since 0.7.2.

Added

OpenAICompatClient for arbitrary OpenAI-compatible endpoints. #89 (thanks @lucasgerads).
--backend-timeout proxy option — configurable backend response timeout (default 300s). #91.
--backend-capability {native,prompt} proxy flag — native (default) forwards the client's tools / messages verbatim to a function-calling-capable backend; prompt opts into prompt-injection for non-FC llama.cpp / llamafile backends. Declared once at startup and frozen — never probed or switched mid-stream.
Effective backend_timeout logged at proxy startup.

Changed

BREAKING — --mode {native,prompt} renamed to --backend-capability {native,prompt} (and ProxyServer(mode=…) → ProxyServer(backend_capability=…)). --mode collided with the proxy's managed / external deployment mode; the new name states what it controls — the backend's tool-calling protocol — and reflects that the choice is declared once and frozen, never probed at runtime. There is no deprecation alias (--mode was introduced in 0.7.1). Migration: --mode native → drop it (native is the default) or --backend-capability native; --mode prompt → --backend-capability prompt.
Native function calling is now transparent passthrough — the proxy forwards the client's OpenAI tool / message payloads to the backend verbatim instead of round-tripping them through forge's internal ToolSpec representation, which dropped schema detail.
vLLM model identity consolidated to a single source of truth (the wire model_path and the registry model key are now set together). #75.
The prompt capability is now rejected loudly for ollama / vllm / anthropic backends — previously it was silently ignored for ollama.
stream_options is excluded from proxy passthrough. #94 (thanks @alexandergunnarson).

Fixed

Consistent malformed-tool-call / unexpected-response handling across the OpenAI-shape clients — malformed model tool args drive a retry (TextResponse) instead of degrading silently or raising inconsistently, and non-streaming responses are guarded so a broken provider envelope fails loud.
Guardrails.record() no longer drops tool args for prerequisite tracking. #72 (thanks @hobostay).
Deprecated asyncio API replaced; proxy server input validation added. #71 (thanks @hobostay).
Proxy input hardening, non-blocking Ollama stop, client shutdown, and loud arg decode. #86.
Dead code and a fragile variable reference cleaned up in LlamafileClient. #73 (thanks @hobostay).

Removed

Runtime auto function-calling mode in LlamafileClient — the proxy never used it, and its mid-request probe-and-switch behavior is replaced by the declared-and-frozen --backend-capability.

Breaking Changes

Flag `--mode {native,prompt}` renamed to `--backend-capability {native,prompt}`; no deprecation alias – migration required (drop flag for native default or explicitly set `--backend-capability`)
Runtime `auto` function‑calling mode removed from LlamafileClient

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Forge

Get notified when new releases ship.

About Forge

All releases →

Related context

Related tools

Earlier breaking changes

v0.7.5 Changes default behavior to replay no reasoning blocks.
v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].