Skip to content

Forge

v0.7.3 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-workflow agents function-calling llama-cpp llamafile
+5 more
llm ollama python self-hosted tool-calling

ReleasePort's take

Moderate signal
editorial:auto 2d

The `--mode` CLI flag has been renamed to `--backend-capability`; native function calling now defaults to transparent passthrough.

Why it matters: All users of the proxy CLI must update scripts and configurations by version v0.7.3, or commands will fail due to the removed `--mode` option.

Summary

AI summary

BREAKING: --mode flag renamed to --backend-capability; native function calling becomes default transparent passthrough.

Changes in this release

Breaking High

Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.

Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.

Source: llm_adapter@2026-06-01

Confidence: high

Feature Medium

Adds `OpenAICompatClient` for arbitrary OpenAI‑compatible endpoints.

Adds `OpenAICompatClient` for arbitrary OpenAI‑compatible endpoints.

Source: llm_adapter@2026-06-01

Confidence: high

Feature Medium

Adds `--backend-timeout` proxy option (default 300 s).

Adds `--backend-timeout` proxy option (default 300 s).

Source: llm_adapter@2026-06-01

Confidence: high

Feature Medium

Adds `--backend-capability {native,prompt}` proxy flag (default native).

Adds `--backend-capability {native,prompt}` proxy flag (default native).

Source: llm_adapter@2026-06-01

Confidence: high

Deprecation Low

Removes runtime `auto` function‑calling mode in `LlamafileClient`.

Removes runtime `auto` function‑calling mode in `LlamafileClient`.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

Fixes inconsistent handling of malformed tool calls and unexpected responses across OpenAI‑shape clients.

Fixes inconsistent handling of malformed tool calls and unexpected responses across OpenAI‑shape clients.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

Fixes `Guardrails.record()` from dropping tool arguments for prerequisite tracking.

Fixes `Guardrails.record()` from dropping tool arguments for prerequisite tracking.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

Adds proxy input hardening, non‑blocking Ollama stop, client shutdown handling, and loud argument decode errors.

Adds proxy input hardening, non‑blocking Ollama stop, client shutdown handling, and loud argument decode errors.

Source: llm_adapter@2026-06-01

Confidence: high

Refactor Low

Replaces deprecated asyncio API and adds proxy server input validation.

Replaces deprecated asyncio API and adds proxy server input validation.

Source: llm_adapter@2026-06-01

Confidence: high

Refactor Low

Cleans up dead code and a fragile variable reference in `LlamafileClient`.

Cleans up dead code and a fragile variable reference in `LlamafileClient`.

Source: llm_adapter@2026-06-01

Confidence: high

Full changelog

[0.7.3] — 2026-06-01

Native-first proxy. With native function calling now well-supported across modern local models, the proxy defaults to — and is optimized for — native tool calling, forwarding the client's OpenAI tools / messages to the backend verbatim. Prompt-injection remains available as an explicit opt-in for llama.cpp / llamafile backends that lack a function-calling template, but it is no longer the default path. This release also folds in the OpenAI-compatible client and several proxy / eval fixes that landed on main since 0.7.2.

Added

  • OpenAICompatClient for arbitrary OpenAI-compatible endpoints. #89 (thanks @lucasgerads).
  • --backend-timeout proxy option — configurable backend response timeout (default 300s). #91.
  • --backend-capability {native,prompt} proxy flagnative (default) forwards the client's tools / messages verbatim to a function-calling-capable backend; prompt opts into prompt-injection for non-FC llama.cpp / llamafile backends. Declared once at startup and frozen — never probed or switched mid-stream.
  • Effective backend_timeout logged at proxy startup.

Changed

  • BREAKING — --mode {native,prompt} renamed to --backend-capability {native,prompt} (and ProxyServer(mode=…)ProxyServer(backend_capability=…)). --mode collided with the proxy's managed / external deployment mode; the new name states what it controls — the backend's tool-calling protocol — and reflects that the choice is declared once and frozen, never probed at runtime. There is no deprecation alias (--mode was introduced in 0.7.1). Migration: --mode native → drop it (native is the default) or --backend-capability native; --mode prompt--backend-capability prompt.
  • Native function calling is now transparent passthrough — the proxy forwards the client's OpenAI tool / message payloads to the backend verbatim instead of round-tripping them through forge's internal ToolSpec representation, which dropped schema detail.
  • vLLM model identity consolidated to a single source of truth (the wire model_path and the registry model key are now set together). #75.
  • The prompt capability is now rejected loudly for ollama / vllm / anthropic backends — previously it was silently ignored for ollama.
  • stream_options is excluded from proxy passthrough. #94 (thanks @alexandergunnarson).

Fixed

  • Consistent malformed-tool-call / unexpected-response handling across the OpenAI-shape clients — malformed model tool args drive a retry (TextResponse) instead of degrading silently or raising inconsistently, and non-streaming responses are guarded so a broken provider envelope fails loud.
  • Guardrails.record() no longer drops tool args for prerequisite tracking. #72 (thanks @hobostay).
  • Deprecated asyncio API replaced; proxy server input validation added. #71 (thanks @hobostay).
  • Proxy input hardening, non-blocking Ollama stop, client shutdown, and loud arg decode. #86.
  • Dead code and a fragile variable reference cleaned up in LlamafileClient. #73 (thanks @hobostay).

Removed

  • Runtime auto function-calling mode in LlamafileClient — the proxy never used it, and its mid-request probe-and-switch behavior is replaced by the declared-and-frozen --backend-capability.

Breaking Changes

  • Flag `--mode {native,prompt}` renamed to `--backend-capability {native,prompt}`; no deprecation alias – migration required (drop flag for native default or explicitly set `--backend-capability`)
  • Runtime `auto` function‑calling mode removed from LlamafileClient

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Forge

Get notified when new releases ship.

Sign up free

About Forge

All releases →

Related context

Earlier breaking changes

  • v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
  • v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
  • v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].

Beta — feedback welcome: [email protected]