Skip to content

noonghunna/club-3090

v0.8.7 Breaking

This release includes 5 breaking changes for platform teams planning a safe upgrade.

βœ“ No known CVEs patched
Read the diff β†’ Tool health β†’ What is this tool? β†’

✓ No known CVEs patched in this version

Affected surfaces

breaking_upgrade deps

ReleasePort's take

Light signal
editorial:auto 4h

Genesis vLLM now defaults to `vllm/minimal` and pins the stable `v0.22.0` version, retiring the custom `vllm-club3090` image.

Why it matters: Affects deployments using Genesis vLLM; migration required before next release as the deprecated composition is removed and unsupported images are retired.

Summary

AI summary

Broad release touches ✨ Features, πŸ› Bug fixes, πŸ“ Documentation, and Highlights.

Changes in this release

Breaking High

Genesis vLLM composes deprecated; default to `vllm/minimal`.

Genesis vLLM composes deprecated; default to `vllm/minimal`.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Feature Medium

beellama.cpp (DFlash) promoted to single‑card default for Qwen3.6-27B and Gemma-4-31B.

beellama.cpp (DFlash) promoted to single‑card default for Qwen3.6-27B and Gemma-4-31B.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Feature Medium

`switch.sh --list` groups configs by model and topology, filters by GPU count, shows max context and health.

`switch.sh --list` groups configs by model and topology, filters by GPU count, shows max context and health.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Feature Medium

`NVLINK_MODE=pcie_p2p` added for boards lacking NVLink but supporting PCIe peer‑to‑peer.

`NVLINK_MODE=pcie_p2p` added for boards lacking NVLink but supporting PCIe peer‑to‑peer.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Feature Medium

`soak-test` auto‑detects the 35B‑A3B container for testing.

`soak-test` auto‑detects the 35B‑A3B container for testing.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Dependency Medium

vLLM composes pinned to stable `v0.22.0`; custom `vllm-club3090` image retired.

vLLM composes pinned to stable `v0.22.0`; custom `vllm-club3090` image retired.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Bugfix Medium

`setup.sh` now rejects unknown positional arguments instead of silently ignoring them.

`setup.sh` now rejects unknown positional arguments instead of silently ignoring them.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Bugfix Medium

`kv-calc` now shards Qwen3‑Next‑MoE weights by TP, fixing false long‑context β€œwon’t fit” errors.

`kv-calc` now shards Qwen3‑Next‑MoE weights by TP, fixing false long‑context β€œwon’t fit” errors.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Bugfix Medium

`preflight / report.sh` now detects beellama containers and validates the spec‑draft GGUF.

`preflight / report.sh` now detects beellama containers and validates the spec‑draft GGUF.

Source: llm_adapter@2026-06-03

Confidence: high

β€”
Bugfix Medium

`kv-calc` shard fix for Qwen3‑Next‑MoE weights (duplicate entry removed).

`kv-calc` shard fix for Qwen3‑Next‑MoE weights (duplicate entry removed).

Source: llm_adapter@2026-06-03

Confidence: low

β€”
Full changelog

⚠️ Heads-up β€” defaults & pins changed.

  • Single-card defaults are now beellama.cpp (DFlash): Qwen3.6-27B β†’ beellama/dflash, Gemma-4-31B β†’ beellama/gemma-dflash (previously vLLM/Genesis paths).
  • vLLM composes now pin stable v0.22.0 instead of nightlies (which Docker Hub purged, causing "image not found" boot failures). The custom vllm-club3090 image is retired β€” we run stock vLLM with patches mounted at boot.
  • Genesis vLLM composes deprecated; vLLM single-card default β†’ vllm/minimal. Gemma-4 dual slugs renamed/pruned (9 β†’ 3; default vllm/gemma-int8-mtp). vLLM dual-DFlash composes deprecated β†’ use beellama.
    Use the registry --variant keys (bash scripts/launch.sh vllm/dual, beellama/dflash, ik-llama/byteshape-iq4xs-mtp, …); switch.sh --list shows the current set.

v0.8.7 β€” 2026-06-03

Highlights

  • New serving engine β€” beellama.cpp (DFlash). A llama.cpp fork with external-drafter speculative decoding, now a first-class engine and the single-card default for both Qwen3.6-27B and Gemma-4-31B. Experimental Q8_K_XL dual composes included.
  • Qwen3.6-35B-A3B is now Production on dual 3090 β€” 262K context + vision (vllm/qwen-35b-a3b-dual), plus new single-card ik-llama presets (apex-fit-q8q5, community byteshape-iq4xs-mtp).
  • Gemma-4 moved to stable vLLM v0.22.0 β€” off the purged nightlies, reasoning parser wired, sampling aligned to the model card.
  • switch.sh overhaul β€” --list groups by model Β· topology, filters to your GPU count, and shows max context + health per config; you can now pin your own default.

✨ Features

  • beellama.cpp DFlash as a first-class engine + promoted to single-card default for Qwen3.6-27B and Gemma-4-31B (#268, #271, #272); experimental v0.3.0 Q8_K_XL dual composes (#296).
  • Qwen3.6-35B-A3B β†’ Production: dual 262K + vision (#259); single-card ik-llama apex-fit-q8q5 (#242) and community byteshape-iq4xs-mtp (#299).
  • Gemma-4 duals β†’ vLLM v0.22.0 with the gemma4 reasoning parser (#287, #289).
  • switch.sh: group --list by model Β· topology with per-slug max-context + health, GPU-count filtering (--all), and user-pinnable defaults (#264, #265, #266, #267, #283).
  • NVLINK_MODE=pcie_p2p for boards with PCIe peer-to-peer but no NVLink (#290/#291).
  • Operational robustness: orphan-safe switch.sh, reboot-surviving vLLM containers, multi-GPU power sweep (#285).
  • Quality/bench tooling: rebench-full reworked (think-OFF + think-ON 8-packs, fail-fast verify-full preflight, live progress β€” #303, #306); quality-test live progress on by default + smarter timeouts (#248, #245); bench measurement-record producer (#249).

πŸ› Bug fixes

  • kv-calc: shard Qwen3-Next-MoE weights by TP β€” fixes a false long-context "won't fit" verdict (#261).
  • NVLink config clobber on PCIe-P2P boards (#290).
  • preflight / report.sh now detect beellama containers + check the spec-draft GGUF.
  • setup.sh rejects unknown positional args instead of silently ignoring them (#273).
  • soak-test auto-detects the 35B-A3B container (#244).

πŸ“ Documentation

  • New guide: "Bring your own model or compose" β€” serve, tune, and validate any model on your rig without the curated catalog.
  • Contribution rules: one model β€” or one feature/concern β€” per PR; mandatory compose Profile header.
  • FAQ: agent stops mid-task β†’ set client temperature 0.6 (#232); which KV-cache quant to pick; how to switch models; 4090/5090 cross-rig notes.
  • Results Card β€” a standard format for sharing a config's measured results.
  • Refreshed the add-a-model workflow; AGENTS.md is now one source with CLAUDE.md.

🧹 Maintenance

  • vLLM composes pinned to stable v0.22.0 (off nightlies); custom vllm-club3090 image retired β€” stock vLLM + mounted patches (#269, #277).
  • Genesis vLLM composes deprecated; vLLM single default β†’ vllm/minimal (#276).
  • Pruned dual vLLM composes; renamed/laddered Gemma-4 slugs (#278, #279, #286); litellm route cleanup (#263).

[Pin: git checkout v0.8.7] Β· Full diff

Breaking Changes

  • Single‑card defaults changed to beellama/dflash for Qwen3.6‑27B and Gemma‑4‑31B.
  • vLLM composes now pinned to stable v0.22.0; custom vllm-club3090 image retired.
  • Genesis vLLM composes deprecated; default single‑card compose switched to vllm/minimal.
  • Gemma‑4 dual slugs pruned from 9 to 3 and renamed (default now vllm/gemma-int8-mtp).
  • vLLM dual‑DFlash composes deprecated – use beellama instead.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track noonghunna/club-3090

Get notified when new releases ship.

Sign up free

About noonghunna/club-3090

All releases β†’

Related context

Earlier breaking changes

  • v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.

Beta — feedback welcome: [email protected]