Release history
Forge releases
All releases
5 shown
- Update any scripts or configs using `--mode` to use `--backend-capability native` or remove the flag for default behavior
- Ensure backend capability selection matches backend support (prompt capability now rejected loudly for ollama, vllm, anthropic)
- Flag `--mode {native,prompt}` renamed to `--backend-capability {native,prompt}`; no deprecation alias – migration required (drop flag for native default or explicitly set `--backend-capability`)
- Runtime `auto` function‑calling mode removed from LlamafileClient
- Added OpenAICompatClient for arbitrary OpenAI‑compatible endpoints
- Added configurable `--backend-timeout` proxy option (default 300s)
- Native function calling now transparent passthrough, forwarding client tools/messages verbatim
Full changelog
[0.7.3] — 2026-06-01
Native-first proxy. With native function calling now well-supported across modern local models, the proxy defaults to — and is optimized for — native tool calling, forwarding the client's OpenAI tools / messages to the backend verbatim. Prompt-injection remains available as an explicit opt-in for llama.cpp / llamafile backends that lack a function-calling template, but it is no longer the default path. This release also folds in the OpenAI-compatible client and several proxy / eval fixes that landed on main since 0.7.2.
Added
OpenAICompatClientfor arbitrary OpenAI-compatible endpoints. #89 (thanks @lucasgerads).--backend-timeoutproxy option — configurable backend response timeout (default 300s). #91.--backend-capability {native,prompt}proxy flag —native(default) forwards the client's tools / messages verbatim to a function-calling-capable backend;promptopts into prompt-injection for non-FC llama.cpp / llamafile backends. Declared once at startup and frozen — never probed or switched mid-stream.- Effective
backend_timeoutlogged at proxy startup.
Changed
- BREAKING —
--mode {native,prompt}renamed to--backend-capability {native,prompt}(andProxyServer(mode=…)→ProxyServer(backend_capability=…)).--modecollided with the proxy's managed / external deployment mode; the new name states what it controls — the backend's tool-calling protocol — and reflects that the choice is declared once and frozen, never probed at runtime. There is no deprecation alias (--modewas introduced in 0.7.1). Migration:--mode native→ drop it (native is the default) or--backend-capability native;--mode prompt→--backend-capability prompt. - Native function calling is now transparent passthrough — the proxy forwards the client's OpenAI tool / message payloads to the backend verbatim instead of round-tripping them through forge's internal
ToolSpecrepresentation, which dropped schema detail. - vLLM model identity consolidated to a single source of truth (the wire
model_pathand the registrymodelkey are now set together). #75. - The
promptcapability is now rejected loudly for ollama / vllm / anthropic backends — previously it was silently ignored for ollama. stream_optionsis excluded from proxy passthrough. #94 (thanks @alexandergunnarson).
Fixed
- Consistent malformed-tool-call / unexpected-response handling across the OpenAI-shape clients — malformed model tool args drive a retry (
TextResponse) instead of degrading silently or raising inconsistently, and non-streaming responses are guarded so a broken provider envelope fails loud. Guardrails.record()no longer drops tool args for prerequisite tracking. #72 (thanks @hobostay).- Deprecated asyncio API replaced; proxy server input validation added. #71 (thanks @hobostay).
- Proxy input hardening, non-blocking Ollama stop, client shutdown, and loud arg decode. #86.
- Dead code and a fragile variable reference cleaned up in
LlamafileClient. #73 (thanks @hobostay).
Removed
- Runtime
autofunction-calling mode inLlamafileClient— the proxy never used it, and its mid-request probe-and-switch behavior is replaced by the declared-and-frozen--backend-capability.