Skip to content

Forge

AI Agents & Assistants

A reliability layer for self‑hosted LLM tool‑calling that adds guardrails (retry nudges, response validation) to make model‑driven workflows more robust.

Python Latest v0.7.4 · 22h ago Security brief →

Features

  • Adds guardrails (retry nudges, rescue parsing, response validation) with zero required steps
  • Works as a drop‑in proxy for OpenAI and Anthropic APIs
  • Provides WorkflowRunner to manage full agent loops and slot‑based GPU sharing

Recent releases

View all 5 releases →
No immediate action
v0.7.4 Breaking risk

Tool-error correction + max‑errors flag

v0.7.3 Breaking risk
⚠ Upgrade required
  • Update any scripts or configs using `--mode` to use `--backend-capability native` or remove the flag for default behavior
  • Ensure backend capability selection matches backend support (prompt capability now rejected loudly for ollama, vllm, anthropic)
Breaking changes
  • Flag `--mode {native,prompt}` renamed to `--backend-capability {native,prompt}`; no deprecation alias – migration required (drop flag for native default or explicitly set `--backend-capability`)
  • Runtime `auto` function‑calling mode removed from LlamafileClient
Notable features
  • Added OpenAICompatClient for arbitrary OpenAI‑compatible endpoints
  • Added configurable `--backend-timeout` proxy option (default 300s)
  • Native function calling now transparent passthrough, forwarding client tools/messages verbatim
Full changelog

[0.7.3] — 2026-06-01

Native-first proxy. With native function calling now well-supported across modern local models, the proxy defaults to — and is optimized for — native tool calling, forwarding the client's OpenAI tools / messages to the backend verbatim. Prompt-injection remains available as an explicit opt-in for llama.cpp / llamafile backends that lack a function-calling template, but it is no longer the default path. This release also folds in the OpenAI-compatible client and several proxy / eval fixes that landed on main since 0.7.2.

Added

  • OpenAICompatClient for arbitrary OpenAI-compatible endpoints. #89 (thanks @lucasgerads).
  • --backend-timeout proxy option — configurable backend response timeout (default 300s). #91.
  • --backend-capability {native,prompt} proxy flagnative (default) forwards the client's tools / messages verbatim to a function-calling-capable backend; prompt opts into prompt-injection for non-FC llama.cpp / llamafile backends. Declared once at startup and frozen — never probed or switched mid-stream.
  • Effective backend_timeout logged at proxy startup.

Changed

  • BREAKING — --mode {native,prompt} renamed to --backend-capability {native,prompt} (and ProxyServer(mode=…)ProxyServer(backend_capability=…)). --mode collided with the proxy's managed / external deployment mode; the new name states what it controls — the backend's tool-calling protocol — and reflects that the choice is declared once and frozen, never probed at runtime. There is no deprecation alias (--mode was introduced in 0.7.1). Migration: --mode native → drop it (native is the default) or --backend-capability native; --mode prompt--backend-capability prompt.
  • Native function calling is now transparent passthrough — the proxy forwards the client's OpenAI tool / message payloads to the backend verbatim instead of round-tripping them through forge's internal ToolSpec representation, which dropped schema detail.
  • vLLM model identity consolidated to a single source of truth (the wire model_path and the registry model key are now set together). #75.
  • The prompt capability is now rejected loudly for ollama / vllm / anthropic backends — previously it was silently ignored for ollama.
  • stream_options is excluded from proxy passthrough. #94 (thanks @alexandergunnarson).

Fixed

  • Consistent malformed-tool-call / unexpected-response handling across the OpenAI-shape clients — malformed model tool args drive a retry (TextResponse) instead of degrading silently or raising inconsistently, and non-streaming responses are guarded so a broken provider envelope fails loud.
  • Guardrails.record() no longer drops tool args for prerequisite tracking. #72 (thanks @hobostay).
  • Deprecated asyncio API replaced; proxy server input validation added. #71 (thanks @hobostay).
  • Proxy input hardening, non-blocking Ollama stop, client shutdown, and loud arg decode. #86.
  • Dead code and a fragile variable reference cleaned up in LlamafileClient. #73 (thanks @hobostay).

Removed

  • Runtime auto function-calling mode in LlamafileClient — the proxy never used it, and its mid-request probe-and-switch behavior is replaced by the declared-and-frozen --backend-capability.
No immediate action
v0.7.2 New feature

vLLM backend support

Review required
v0.7.1 New feature
Auth RBAC

Anthropic API + token usage

No immediate action
v0.7.0 Breaking risk

Eval refresh + tool errors + docs rebuild

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
1,949
Forks
135
Languages
Python TypeScript Dockerfile

Install & Platforms

Install via
pip

Beta — feedback welcome: [email protected]