This release includes 2 breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+5 more
ReleasePort's take
Moderate signalThe `--mode` CLI flag has been renamed to `--backend-capability`; native function calling now defaults to transparent passthrough.
Why it matters: All users of the proxy CLI must update scripts and configurations by version v0.7.3, or commands will fail due to the removed `--mode` option.
Summary
AI summaryBREAKING: --mode flag renamed to --backend-capability; native function calling becomes default transparent passthrough.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | High |
Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias. Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Medium |
Adds `OpenAICompatClient` for arbitrary OpenAI‑compatible endpoints. Adds `OpenAICompatClient` for arbitrary OpenAI‑compatible endpoints. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Medium |
Adds `--backend-timeout` proxy option (default 300 s). Adds `--backend-timeout` proxy option (default 300 s). Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Medium |
Adds `--backend-capability {native,prompt}` proxy flag (default native). Adds `--backend-capability {native,prompt}` proxy flag (default native). Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Deprecation | Low |
Removes runtime `auto` function‑calling mode in `LlamafileClient`. Removes runtime `auto` function‑calling mode in `LlamafileClient`. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Bugfix | Medium |
Fixes inconsistent handling of malformed tool calls and unexpected responses across OpenAI‑shape clients. Fixes inconsistent handling of malformed tool calls and unexpected responses across OpenAI‑shape clients. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Bugfix | Medium |
Fixes `Guardrails.record()` from dropping tool arguments for prerequisite tracking. Fixes `Guardrails.record()` from dropping tool arguments for prerequisite tracking. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Bugfix | Medium |
Adds proxy input hardening, non‑blocking Ollama stop, client shutdown handling, and loud argument decode errors. Adds proxy input hardening, non‑blocking Ollama stop, client shutdown handling, and loud argument decode errors. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Refactor | Low |
Replaces deprecated asyncio API and adds proxy server input validation. Replaces deprecated asyncio API and adds proxy server input validation. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Refactor | Low |
Cleans up dead code and a fragile variable reference in `LlamafileClient`. Cleans up dead code and a fragile variable reference in `LlamafileClient`. Source: llm_adapter@2026-06-01 Confidence: high |
— |
Full changelog
[0.7.3] — 2026-06-01
Native-first proxy. With native function calling now well-supported across modern local models, the proxy defaults to — and is optimized for — native tool calling, forwarding the client's OpenAI tools / messages to the backend verbatim. Prompt-injection remains available as an explicit opt-in for llama.cpp / llamafile backends that lack a function-calling template, but it is no longer the default path. This release also folds in the OpenAI-compatible client and several proxy / eval fixes that landed on main since 0.7.2.
Added
OpenAICompatClientfor arbitrary OpenAI-compatible endpoints. #89 (thanks @lucasgerads).--backend-timeoutproxy option — configurable backend response timeout (default 300s). #91.--backend-capability {native,prompt}proxy flag —native(default) forwards the client's tools / messages verbatim to a function-calling-capable backend;promptopts into prompt-injection for non-FC llama.cpp / llamafile backends. Declared once at startup and frozen — never probed or switched mid-stream.- Effective
backend_timeoutlogged at proxy startup.
Changed
- BREAKING —
--mode {native,prompt}renamed to--backend-capability {native,prompt}(andProxyServer(mode=…)→ProxyServer(backend_capability=…)).--modecollided with the proxy's managed / external deployment mode; the new name states what it controls — the backend's tool-calling protocol — and reflects that the choice is declared once and frozen, never probed at runtime. There is no deprecation alias (--modewas introduced in 0.7.1). Migration:--mode native→ drop it (native is the default) or--backend-capability native;--mode prompt→--backend-capability prompt. - Native function calling is now transparent passthrough — the proxy forwards the client's OpenAI tool / message payloads to the backend verbatim instead of round-tripping them through forge's internal
ToolSpecrepresentation, which dropped schema detail. - vLLM model identity consolidated to a single source of truth (the wire
model_pathand the registrymodelkey are now set together). #75. - The
promptcapability is now rejected loudly for ollama / vllm / anthropic backends — previously it was silently ignored for ollama. stream_optionsis excluded from proxy passthrough. #94 (thanks @alexandergunnarson).
Fixed
- Consistent malformed-tool-call / unexpected-response handling across the OpenAI-shape clients — malformed model tool args drive a retry (
TextResponse) instead of degrading silently or raising inconsistently, and non-streaming responses are guarded so a broken provider envelope fails loud. Guardrails.record()no longer drops tool args for prerequisite tracking. #72 (thanks @hobostay).- Deprecated asyncio API replaced; proxy server input validation added. #71 (thanks @hobostay).
- Proxy input hardening, non-blocking Ollama stop, client shutdown, and loud arg decode. #86.
- Dead code and a fragile variable reference cleaned up in
LlamafileClient. #73 (thanks @hobostay).
Removed
- Runtime
autofunction-calling mode inLlamafileClient— the proxy never used it, and its mid-request probe-and-switch behavior is replaced by the declared-and-frozen--backend-capability.
Breaking Changes
- Flag `--mode {native,prompt}` renamed to `--backend-capability {native,prompt}`; no deprecation alias – migration required (drop flag for native default or explicitly set `--backend-capability`)
- Runtime `auto` function‑calling mode removed from LlamafileClient
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Forge
All releases →Related context
Related tools
Earlier breaking changes
- v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
- v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
- v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].
Beta — feedback welcome: [email protected]