This release includes 5 breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Affected surfaces
ReleasePort's take
Light signalGenesis vLLM now defaults to `vllm/minimal` and pins the stable `v0.22.0` version, retiring the custom `vllm-club3090` image.
Why it matters: Affects deployments using Genesis vLLM; migration required before next release as the deprecated composition is removed and unsupported images are retired.
Summary
AI summaryBroad release touches β¨ Features, π Bug fixes, π Documentation, and Highlights.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | High |
Genesis vLLM composes deprecated; default to `vllm/minimal`. Genesis vLLM composes deprecated; default to `vllm/minimal`. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Feature | Medium |
beellama.cpp (DFlash) promoted to singleβcard default for Qwen3.6-27B and Gemma-4-31B. beellama.cpp (DFlash) promoted to singleβcard default for Qwen3.6-27B and Gemma-4-31B. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Feature | Medium |
`switch.sh --list` groups configs by model and topology, filters by GPU count, shows max context and health. `switch.sh --list` groups configs by model and topology, filters by GPU count, shows max context and health. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Feature | Medium |
`NVLINK_MODE=pcie_p2p` added for boards lacking NVLink but supporting PCIe peerβtoβpeer. `NVLINK_MODE=pcie_p2p` added for boards lacking NVLink but supporting PCIe peerβtoβpeer. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Feature | Medium |
`soak-test` autoβdetects the 35BβA3B container for testing. `soak-test` autoβdetects the 35BβA3B container for testing. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Dependency | Medium |
vLLM composes pinned to stable `v0.22.0`; custom `vllm-club3090` image retired. vLLM composes pinned to stable `v0.22.0`; custom `vllm-club3090` image retired. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Bugfix | Medium |
`setup.sh` now rejects unknown positional arguments instead of silently ignoring them. `setup.sh` now rejects unknown positional arguments instead of silently ignoring them. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Bugfix | Medium |
`kv-calc` now shards Qwen3βNextβMoE weights by TP, fixing false longβcontext βwonβt fitβ errors. `kv-calc` now shards Qwen3βNextβMoE weights by TP, fixing false longβcontext βwonβt fitβ errors. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Bugfix | Medium |
`preflight / report.sh` now detects beellama containers and validates the specβdraft GGUF. `preflight / report.sh` now detects beellama containers and validates the specβdraft GGUF. Source: llm_adapter@2026-06-03 Confidence: high |
β |
| Bugfix | Medium |
`kv-calc` shard fix for Qwen3βNextβMoE weights (duplicate entry removed). `kv-calc` shard fix for Qwen3βNextβMoE weights (duplicate entry removed). Source: llm_adapter@2026-06-03 Confidence: low |
β |
Full changelog
β οΈ Heads-up β defaults & pins changed.
- Single-card defaults are now beellama.cpp (DFlash): Qwen3.6-27B β
beellama/dflash, Gemma-4-31B βbeellama/gemma-dflash(previously vLLM/Genesis paths).- vLLM composes now pin stable
v0.22.0instead of nightlies (which Docker Hub purged, causing "image not found" boot failures). The customvllm-club3090image is retired β we run stock vLLM with patches mounted at boot.- Genesis vLLM composes deprecated; vLLM single-card default β
vllm/minimal. Gemma-4 dual slugs renamed/pruned (9 β 3; defaultvllm/gemma-int8-mtp). vLLM dual-DFlash composes deprecated β use beellama.
Use the registry--variantkeys (bash scripts/launch.sh vllm/dual,beellama/dflash,ik-llama/byteshape-iq4xs-mtp, β¦);switch.sh --listshows the current set.
v0.8.7 β 2026-06-03
Highlights
- New serving engine β beellama.cpp (DFlash). A llama.cpp fork with external-drafter speculative decoding, now a first-class engine and the single-card default for both Qwen3.6-27B and Gemma-4-31B. Experimental Q8_K_XL dual composes included.
- Qwen3.6-35B-A3B is now Production on dual 3090 β 262K context + vision (
vllm/qwen-35b-a3b-dual), plus new single-card ik-llama presets (apex-fit-q8q5, communitybyteshape-iq4xs-mtp). - Gemma-4 moved to stable vLLM v0.22.0 β off the purged nightlies, reasoning parser wired, sampling aligned to the model card.
switch.shoverhaul β--listgroups by model Β· topology, filters to your GPU count, and shows max context + health per config; you can now pin your own default.
β¨ Features
- beellama.cpp DFlash as a first-class engine + promoted to single-card default for Qwen3.6-27B and Gemma-4-31B (#268, #271, #272); experimental v0.3.0 Q8_K_XL dual composes (#296).
- Qwen3.6-35B-A3B β Production: dual 262K + vision (#259); single-card ik-llama
apex-fit-q8q5(#242) and communitybyteshape-iq4xs-mtp(#299). - Gemma-4 duals β vLLM v0.22.0 with the gemma4 reasoning parser (#287, #289).
switch.sh: group--listby model Β· topology with per-slug max-context + health, GPU-count filtering (--all), and user-pinnable defaults (#264, #265, #266, #267, #283).NVLINK_MODE=pcie_p2pfor boards with PCIe peer-to-peer but no NVLink (#290/#291).- Operational robustness: orphan-safe
switch.sh, reboot-surviving vLLM containers, multi-GPU power sweep (#285). - Quality/bench tooling:
rebench-fullreworked (think-OFF + think-ON 8-packs, fail-fastverify-fullpreflight, live progress β #303, #306);quality-testlive progress on by default + smarter timeouts (#248, #245); bench measurement-record producer (#249).
π Bug fixes
- kv-calc: shard Qwen3-Next-MoE weights by TP β fixes a false long-context "won't fit" verdict (#261).
- NVLink config clobber on PCIe-P2P boards (#290).
- preflight / report.sh now detect beellama containers + check the spec-draft GGUF.
- setup.sh rejects unknown positional args instead of silently ignoring them (#273).
- soak-test auto-detects the 35B-A3B container (#244).
π Documentation
- New guide: "Bring your own model or compose" β serve, tune, and validate any model on your rig without the curated catalog.
- Contribution rules: one model β or one feature/concern β per PR; mandatory compose
Profileheader. - FAQ: agent stops mid-task β set client temperature 0.6 (#232); which KV-cache quant to pick; how to switch models; 4090/5090 cross-rig notes.
- Results Card β a standard format for sharing a config's measured results.
- Refreshed the add-a-model workflow;
AGENTS.mdis now one source withCLAUDE.md.
π§Ή Maintenance
- vLLM composes pinned to stable v0.22.0 (off nightlies); custom
vllm-club3090image retired β stock vLLM + mounted patches (#269, #277). - Genesis vLLM composes deprecated; vLLM single default β
vllm/minimal(#276). - Pruned dual vLLM composes; renamed/laddered Gemma-4 slugs (#278, #279, #286); litellm route cleanup (#263).
[Pin: git checkout v0.8.7] Β· Full diff
Breaking Changes
- Singleβcard defaults changed to beellama/dflash for Qwen3.6β27B and Gemmaβ4β31B.
- vLLM composes now pinned to stable v0.22.0; custom vllm-club3090 image retired.
- Genesis vLLM composes deprecated; default singleβcard compose switched to vllm/minimal.
- Gemmaβ4 dual slugs pruned from 9 to 3 and renamed (default now vllm/gemma-int8-mtp).
- vLLM dualβDFlash composes deprecated β use beellama instead.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About noonghunna/club-3090
All releases βBeta — feedback welcome: [email protected]