Skip to content

Forge

v0.7.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-workflow agents function-calling llama-cpp llamafile
+5 more
llm ollama python self-hosted tool-calling

ReleasePort's take

Light signal
editorial:auto 10d

ReleasePort Layer 1 v0.7.2 adds full vLLM backend support via a new VLLMClient, managed and external proxy modes, model‑name discovery, updated documentation, and improved error handling.

Why it matters: Enables developers to serve AWQ/GPTQ models through vLLM with both managed and external proxy configurations; fails fast if the backend lacks context length without a token budget, improving reliability.

Summary

AI summary

Adds vLLM backend support with managed and external proxy modes for serving AWQ/GPTQ models.

Changes in this release

Feature Medium

Adds vLLM backend (`VLLMClient`) for OpenAI‑compatible vLLM servers.

Adds vLLM backend (`VLLMClient`) for OpenAI‑compatible vLLM servers.

Source: llm_adapter@2026-05-24

Confidence: high

Feature Medium

Adds managed and external proxy modes for vLLM via `--backend vllm` and `--backend-url`.

Adds managed and external proxy modes for vLLM via `--backend vllm` and `--backend-url`.

Source: llm_adapter@2026-05-24

Confidence: high

Feature Medium

Adds served‑model‑name discovery in external vLLM mode from `/v1/models` endpoint.

Adds served‑model‑name discovery in external vLLM mode from `/v1/models` endpoint.

Source: llm_adapter@2026-05-24

Confidence: high

Feature Medium

Adds vLLM section in Backend Setup documentation (docs/BACKEND_SETUP.md).

Adds vLLM section in Backend Setup documentation (docs/BACKEND_SETUP.md).

Source: llm_adapter@2026-05-24

Confidence: low

Bugfix Medium

Makes external mode fail fast when backend reports no context length without `--budget-tokens`.

Makes external mode fail fast when backend reports no context length without `--budget-tokens`.

Source: llm_adapter@2026-05-24

Confidence: high

Refactor Medium

Changes proxy managed mode to delegate server start/budget logic to `setup_backend()`.

Changes proxy managed mode to delegate server start/budget logic to `setup_backend()`.

Source: llm_adapter@2026-05-24

Confidence: high

Full changelog

[0.7.2] — 2026-05-24

vLLM backend support — serve AWQ/GPTQ and other vLLM-hosted models behind forge's guardrails, in both proxy modes and via WorkflowRunner.

Added

  • vLLM backend (VLLMClient). OpenAI-compatible client for a vLLM server, consuming vLLM's server-side tool_calls and reasoning (vLLM 0.21) fields. Native function calling only — vLLM parses tools server-side via --enable-auto-tool-choice --tool-call-parser, so there is no prompt-injection mode. Exported from forge and forge.clients.
  • vLLM in managed + external proxy modes. --backend vllm --model-path <dir|hf-repo-id> launches and manages a vLLM server; --backend-url <url> --backend vllm proxies an externally managed one. setup_backend() / ServerManager gain a model_path parameter (the vLLM identity, distinct from gguf_path).
  • vLLM served-model-name discovery in external mode. vLLM validates the request model field against its --served-model-name and 404s on a mismatch (unlike llama.cpp, which ignores the field). The proxy discovers the served name from /v1/models instead of sending a placeholder. #74 (thanks @srinathh).
  • vLLM section in Backend Setup covering the server flags and VLLMClient usage.

Changed

  • Proxy managed mode now delegates to setup_backend() instead of reimplementing the server-start/budget dance, so every managed backend (including vLLM) shares one path. No public API change — ProxyServer and the forge.proxy CLI keep their v0.7.1 signatures, with model_path / --model-path and the vllm backend added.
  • External mode fails fast when a backend reports no context length and no --budget-tokens is set, instead of silently falling back to an 8192-token budget that could truncate context. Anthropic-protocol downstreams are unaffected.

Known limitations

  • The vLLM backend is unit-validated but was not exercised against a live vLLM server in this release cycle. Its client and server-management code carry full unit coverage, and the proxy's protocol translation is verified end-to-end against llama.cpp (the proxy layer is backend-agnostic). scripts/integration_test_proxy.py --vllm-url <url> runs the full request battery against a real vLLM server when one is available.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Forge

Get notified when new releases ship.

Sign up free

About Forge

All releases →

Related context

Earlier breaking changes

  • v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
  • v0.7.3 Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.
  • v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
  • v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].

Beta — feedback welcome: [email protected]