noonghunna/club-3090

v0.8.7 Breaking

This release includes 5 breaking changes for platform teams planning a safe upgrade.

Published 6h Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Affected surfaces

breaking_upgrade deps

ReleasePort's take

Light signal

editorial:auto 4h

Genesis vLLM now defaults to `vllm/minimal` and pins the stable `v0.22.0` version, retiring the custom `vllm-club3090` image.

Why it matters: Affects deployments using Genesis vLLM; migration required before next release as the deprecated composition is removed and unsupported images are retired.

Summary

AI summary

Broad release touches ✨ Features, 🐛 Bug fixes, 📝 Documentation, and Highlights.

Changes in this release

Type	Severity	Summary	CVE
Breaking	High	Genesis vLLM composes deprecated; default to `vllm/minimal`. Genesis vLLM composes deprecated; default to `vllm/minimal`. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature
Feature	Medium	beellama.cpp (DFlash) promoted to single‑card default for Qwen3.6-27B and Gemma-4-31B. beellama.cpp (DFlash) promoted to single‑card default for Qwen3.6-27B and Gemma-4-31B. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature	Medium	`switch.sh --list` groups configs by model and topology, filters by GPU count, shows max context and health. `switch.sh --list` groups configs by model and topology, filters by GPU count, shows max context and health. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature	Medium	`NVLINK_MODE=pcie_p2p` added for boards lacking NVLink but supporting PCIe peer‑to‑peer. `NVLINK_MODE=pcie_p2p` added for boards lacking NVLink but supporting PCIe peer‑to‑peer. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature	Medium	`soak-test` auto‑detects the 35B‑A3B container for testing. `soak-test` auto‑detects the 35B‑A3B container for testing. Source: llm_adapter@2026-06-03 Confidence: high	—
Dependency	Medium	vLLM composes pinned to stable `v0.22.0`; custom `vllm-club3090` image retired. vLLM composes pinned to stable `v0.22.0`; custom `vllm-club3090` image retired. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix
Bugfix	Medium	`setup.sh` now rejects unknown positional arguments instead of silently ignoring them. `setup.sh` now rejects unknown positional arguments instead of silently ignoring them. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Medium	`kv-calc` now shards Qwen3‑Next‑MoE weights by TP, fixing false long‑context “won’t fit” errors. `kv-calc` now shards Qwen3‑Next‑MoE weights by TP, fixing false long‑context “won’t fit” errors. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Medium	`preflight / report.sh` now detects beellama containers and validates the spec‑draft GGUF. `preflight / report.sh` now detects beellama containers and validates the spec‑draft GGUF. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Medium	`kv-calc` shard fix for Qwen3‑Next‑MoE weights (duplicate entry removed). `kv-calc` shard fix for Qwen3‑Next‑MoE weights (duplicate entry removed). Source: llm_adapter@2026-06-03 Confidence: low	—

Full changelog

⚠️ Heads-up — defaults & pins changed.

Single-card defaults are now beellama.cpp (DFlash): Qwen3.6-27B → beellama/dflash, Gemma-4-31B → beellama/gemma-dflash (previously vLLM/Genesis paths).

vLLM composes now pin stable v0.22.0 instead of nightlies (which Docker Hub purged, causing "image not found" boot failures). The custom vllm-club3090 image is retired — we run stock vLLM with patches mounted at boot.

Genesis vLLM composes deprecated; vLLM single-card default → vllm/minimal. Gemma-4 dual slugs renamed/pruned (9 → 3; default vllm/gemma-int8-mtp). vLLM dual-DFlash composes deprecated → use beellama.
Use the registry --variant keys (bash scripts/launch.sh vllm/dual, beellama/dflash, ik-llama/byteshape-iq4xs-mtp, …); switch.sh --list shows the current set.

v0.8.7 — 2026-06-03

Highlights

New serving engine — beellama.cpp (DFlash). A llama.cpp fork with external-drafter speculative decoding, now a first-class engine and the single-card default for both Qwen3.6-27B and Gemma-4-31B. Experimental Q8_K_XL dual composes included.
Qwen3.6-35B-A3B is now Production on dual 3090 — 262K context + vision (vllm/qwen-35b-a3b-dual), plus new single-card ik-llama presets (apex-fit-q8q5, community byteshape-iq4xs-mtp).
Gemma-4 moved to stable vLLM v0.22.0 — off the purged nightlies, reasoning parser wired, sampling aligned to the model card.
switch.sh overhaul — --list groups by model · topology, filters to your GPU count, and shows max context + health per config; you can now pin your own default.

✨ Features

beellama.cpp DFlash as a first-class engine + promoted to single-card default for Qwen3.6-27B and Gemma-4-31B (#268, #271, #272); experimental v0.3.0 Q8_K_XL dual composes (#296).
Qwen3.6-35B-A3B → Production: dual 262K + vision (#259); single-card ik-llama apex-fit-q8q5 (#242) and community byteshape-iq4xs-mtp (#299).
Gemma-4 duals → vLLM v0.22.0 with the gemma4 reasoning parser (#287, #289).
switch.sh: group --list by model · topology with per-slug max-context + health, GPU-count filtering (--all), and user-pinnable defaults (#264, #265, #266, #267, #283).
NVLINK_MODE=pcie_p2p for boards with PCIe peer-to-peer but no NVLink (#290/#291).
Operational robustness: orphan-safe switch.sh, reboot-surviving vLLM containers, multi-GPU power sweep (#285).
Quality/bench tooling: rebench-full reworked (think-OFF + think-ON 8-packs, fail-fast verify-full preflight, live progress — #303, #306); quality-test live progress on by default + smarter timeouts (#248, #245); bench measurement-record producer (#249).

🐛 Bug fixes

kv-calc: shard Qwen3-Next-MoE weights by TP — fixes a false long-context "won't fit" verdict (#261).
NVLink config clobber on PCIe-P2P boards (#290).
preflight / report.sh now detect beellama containers + check the spec-draft GGUF.
setup.sh rejects unknown positional args instead of silently ignoring them (#273).
soak-test auto-detects the 35B-A3B container (#244).

📝 Documentation

New guide: "Bring your own model or compose" — serve, tune, and validate any model on your rig without the curated catalog.
Contribution rules: one model — or one feature/concern — per PR; mandatory compose Profile header.
FAQ: agent stops mid-task → set client temperature 0.6 (#232); which KV-cache quant to pick; how to switch models; 4090/5090 cross-rig notes.
Results Card — a standard format for sharing a config's measured results.
Refreshed the add-a-model workflow; AGENTS.md is now one source with CLAUDE.md.

🧹 Maintenance

vLLM composes pinned to stable v0.22.0 (off nightlies); custom vllm-club3090 image retired — stock vLLM + mounted patches (#269, #277).
Genesis vLLM composes deprecated; vLLM single default → vllm/minimal (#276).
Pruned dual vLLM composes; renamed/laddered Gemma-4 slugs (#278, #279, #286); litellm route cleanup (#263).

[Pin: git checkout v0.8.7] · Full diff

Breaking Changes

Single‑card defaults changed to beellama/dflash for Qwen3.6‑27B and Gemma‑4‑31B.
vLLM composes now pinned to stable v0.22.0; custom vllm-club3090 image retired.
Genesis vLLM composes deprecated; default single‑card compose switched to vllm/minimal.
Gemma‑4 dual slugs pruned from 9 to 3 and renamed (default now vllm/gemma-int8-mtp).
vLLM dual‑DFlash composes deprecated – use beellama instead.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track noonghunna/club-3090

Get notified when new releases ship.

About noonghunna/club-3090

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.