Forge

v0.7.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-workflow agents function-calling llama-cpp llamafile

+5 more

llm ollama python self-hosted tool-calling

ReleasePort's take

Light signal

editorial:auto 2mo

ReleasePort Layer 1 v0.7.2 adds full vLLM backend support via a new VLLMClient, managed and external proxy modes, model‑name discovery, updated documentation, and improved error handling.

Why it matters: Enables developers to serve AWQ/GPTQ models through vLLM with both managed and external proxy configurations; fails fast if the backend lacks context length without a token budget, improving reliability.

Summary

AI summary

Adds vLLM backend support with managed and external proxy modes for serving AWQ/GPTQ models.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Adds vLLM backend (`VLLMClient`) for OpenAI‑compatible vLLM servers. Adds vLLM backend (`VLLMClient`) for OpenAI‑compatible vLLM servers. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds managed and external proxy modes for vLLM via `--backend vllm` and `--backend-url`. Adds managed and external proxy modes for vLLM via `--backend vllm` and `--backend-url`. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds served‑model‑name discovery in external vLLM mode from `/v1/models` endpoint. Adds served‑model‑name discovery in external vLLM mode from `/v1/models` endpoint. Source: llm_adapter@2026-05-24 Confidence: high	—
Feature	Medium	Adds vLLM section in Backend Setup documentation (docs/BACKEND_SETUP.md). Adds vLLM section in Backend Setup documentation (docs/BACKEND_SETUP.md). Source: llm_adapter@2026-05-24 Confidence: low	—
Bugfix	Medium	Makes external mode fail fast when backend reports no context length without `--budget-tokens`. Makes external mode fail fast when backend reports no context length without `--budget-tokens`. Source: llm_adapter@2026-05-24 Confidence: high	—
Refactor	Medium	Changes proxy managed mode to delegate server start/budget logic to `setup_backend()`. Changes proxy managed mode to delegate server start/budget logic to `setup_backend()`. Source: llm_adapter@2026-05-24 Confidence: high	—

Full changelog

[0.7.2] — 2026-05-24

vLLM backend support — serve AWQ/GPTQ and other vLLM-hosted models behind forge's guardrails, in both proxy modes and via WorkflowRunner.

Added

vLLM backend (VLLMClient). OpenAI-compatible client for a vLLM server, consuming vLLM's server-side tool_calls and reasoning (vLLM 0.21) fields. Native function calling only — vLLM parses tools server-side via --enable-auto-tool-choice --tool-call-parser, so there is no prompt-injection mode. Exported from forge and forge.clients.
vLLM in managed + external proxy modes. --backend vllm --model-path <dir|hf-repo-id> launches and manages a vLLM server; --backend-url <url> --backend vllm proxies an externally managed one. setup_backend() / ServerManager gain a model_path parameter (the vLLM identity, distinct from gguf_path).
vLLM served-model-name discovery in external mode. vLLM validates the request model field against its --served-model-name and 404s on a mismatch (unlike llama.cpp, which ignores the field). The proxy discovers the served name from /v1/models instead of sending a placeholder. #74 (thanks @srinathh).
vLLM section in Backend Setup covering the server flags and VLLMClient usage.

Changed

Proxy managed mode now delegates to setup_backend() instead of reimplementing the server-start/budget dance, so every managed backend (including vLLM) shares one path. No public API change — ProxyServer and the forge.proxy CLI keep their v0.7.1 signatures, with model_path / --model-path and the vllm backend added.
External mode fails fast when a backend reports no context length and no --budget-tokens is set, instead of silently falling back to an 8192-token budget that could truncate context. Anthropic-protocol downstreams are unaffected.

Known limitations

The vLLM backend is unit-validated but was not exercised against a live vLLM server in this release cycle. Its client and server-management code carry full unit coverage, and the proxy's protocol translation is verified end-to-end against llama.cpp (the proxy layer is backend-agnostic). scripts/integration_test_proxy.py --vllm-url <url> runs the full request battery against a real vLLM server when one is available.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Forge

Get notified when new releases ship.

About Forge

All releases →

Related context

Related tools

Earlier breaking changes

v0.7.5 Changes default behavior to replay no reasoning blocks.
v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
v0.7.3 Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.
v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].