This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+5 more
ReleasePort's take
Light signalReleasePort Layer 1 v0.7.2 adds full vLLM backend support via a new VLLMClient, managed and external proxy modes, model‑name discovery, updated documentation, and improved error handling.
Why it matters: Enables developers to serve AWQ/GPTQ models through vLLM with both managed and external proxy configurations; fails fast if the backend lacks context length without a token budget, improving reliability.
Summary
AI summaryAdds vLLM backend support with managed and external proxy modes for serving AWQ/GPTQ models.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds vLLM backend (`VLLMClient`) for OpenAI‑compatible vLLM servers. Adds vLLM backend (`VLLMClient`) for OpenAI‑compatible vLLM servers. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds managed and external proxy modes for vLLM via `--backend vllm` and `--backend-url`. Adds managed and external proxy modes for vLLM via `--backend vllm` and `--backend-url`. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds served‑model‑name discovery in external vLLM mode from `/v1/models` endpoint. Adds served‑model‑name discovery in external vLLM mode from `/v1/models` endpoint. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Feature | Medium |
Adds vLLM section in Backend Setup documentation (docs/BACKEND_SETUP.md). Adds vLLM section in Backend Setup documentation (docs/BACKEND_SETUP.md). Source: llm_adapter@2026-05-24 Confidence: low |
— |
| Bugfix | Medium |
Makes external mode fail fast when backend reports no context length without `--budget-tokens`. Makes external mode fail fast when backend reports no context length without `--budget-tokens`. Source: llm_adapter@2026-05-24 Confidence: high |
— |
| Refactor | Medium |
Changes proxy managed mode to delegate server start/budget logic to `setup_backend()`. Changes proxy managed mode to delegate server start/budget logic to `setup_backend()`. Source: llm_adapter@2026-05-24 Confidence: high |
— |
Full changelog
[0.7.2] — 2026-05-24
vLLM backend support — serve AWQ/GPTQ and other vLLM-hosted models behind forge's guardrails, in both proxy modes and via WorkflowRunner.
Added
- vLLM backend (
VLLMClient). OpenAI-compatible client for a vLLM server, consuming vLLM's server-sidetool_callsandreasoning(vLLM 0.21) fields. Native function calling only — vLLM parses tools server-side via--enable-auto-tool-choice --tool-call-parser, so there is no prompt-injection mode. Exported fromforgeandforge.clients. - vLLM in managed + external proxy modes.
--backend vllm --model-path <dir|hf-repo-id>launches and manages a vLLM server;--backend-url <url> --backend vllmproxies an externally managed one.setup_backend()/ServerManagergain amodel_pathparameter (the vLLM identity, distinct fromgguf_path). - vLLM served-model-name discovery in external mode. vLLM validates the request
modelfield against its--served-model-nameand 404s on a mismatch (unlike llama.cpp, which ignores the field). The proxy discovers the served name from/v1/modelsinstead of sending a placeholder. #74 (thanks @srinathh). - vLLM section in Backend Setup covering the server flags and
VLLMClientusage.
Changed
- Proxy managed mode now delegates to
setup_backend()instead of reimplementing the server-start/budget dance, so every managed backend (including vLLM) shares one path. No public API change —ProxyServerand theforge.proxyCLI keep their v0.7.1 signatures, withmodel_path/--model-pathand thevllmbackend added. - External mode fails fast when a backend reports no context length and no
--budget-tokensis set, instead of silently falling back to an 8192-token budget that could truncate context. Anthropic-protocol downstreams are unaffected.
Known limitations
- The vLLM backend is unit-validated but was not exercised against a live vLLM server in this release cycle. Its client and server-management code carry full unit coverage, and the proxy's protocol translation is verified end-to-end against llama.cpp (the proxy layer is backend-agnostic).
scripts/integration_test_proxy.py --vllm-url <url>runs the full request battery against a real vLLM server when one is available.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Forge
All releases →Related context
Related tools
Earlier breaking changes
- v0.7.4 Deprecates pydantic `.model_*` API on `ToolCall` and `TextResponse` dataclasses; construction no longer validates argument shape.
- v0.7.3 Renames `--mode {native,prompt}` to `--backend-capability {native,prompt}`; no deprecation alias.
- v0.7.0 Unknown‑tool handling now replies with [UnknownToolError] on the tool channel instead of user nudges.
- v0.7.0 Changes error reporting: step enforcement and prerequisite violations now emit tool‑channel messages with [StepEnforcementError] / [PrereqError].
Beta — feedback welcome: [email protected]