This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Summary
AI summaryBroad release touches 🎯 Models supported, 📝 Documentation, 🛠️ Scripts + tooling, and 💬 Community.
Full changelog
This is the first tagged snapshot of club-3090. Future releases will be auto-categorized via Release Drafter; this initial release summarizes the major milestones since repo creation on 2026-04-28.
🎯 Models supported
- Qwen3.6-27B (AutoRound INT4) — primary serving model on vLLM, single + dual + multi3/multi4 topologies
- Qwen3.6-35B-A3B (MoE) — secondary serving path on llama.cpp single-card
- Gemma 4 31B — added as supported model on dual-card via vLLM, with MTP drafter (
google/gemma-4-31B-it-assistant) - Qwopus3.6-27B (preview) — INT4+BF16-MTP via Carnice AutoRound recipe (still WIP, not production)
📊 Cross-rig benchmark data
Power-cap efficiency curves contributed across hardware classes:
- 3090 (air + water): 21-cap sweep, knee at 290W (air) / 330W (water)
- 4090 (air): 38-cap sweep at 10W resolution from @laurimyllari, knee at 250-260W, firmware boost-clock plateau locked at SM 2610 MHz / 393W draw from cap=400W onwards
- 5090 (air): 21-cap sweep from @apnar, knee at 400W for both decode + prefill workloads
See docs/HARDWARE.md for the full cross-rig knee table + efficiency-curve charts.
⚠️ Cliffs + gotchas documented
- Cliff 1 (TurboQuant tool-prefill OOM at 25K+ tool messages on single-card 24 GB)
- Cliff 2 (DeltaNet GDN forward OOM at 50-60K single prompt) — closed via Genesis v7.69 recipe + vllm#35975 backport
- Cliff 2b (~21-26K accumulated context, single-card vLLM long-* configs) — route hermes/openhands users to dual.yml or llamacpp/default
- TQ3 long-ctx vs fp8_e5m2 — TQ3 (3-bit) wins on memory but fp8 holds up better past ~30K single-prompt on bandwidth-constrained dual-eGPU rigs
🛠️ Scripts + tooling
scripts/setup.sh— one-shot model + compose installerscripts/launch.sh— variant-aware bootscripts/verify-full.sh/verify-stress.sh/verify-quick.sh— production gatesscripts/soak-continuous.sh— long-running stability check, the only test that catches Cliff 2bscripts/power-cap-sweep.sh— cross-rig efficiency-knee sweeper with auto-plateau detection, SM/mem clock + throttle% + p-state sampling, decode-single + decode-concurrent + prefill-heavy modes
🔧 Pin baseline
- vLLM nightly (dev205+ workspace-lock blocker UNBLOCKED 2026-05-01 PM)
- Genesis pin: v7.72.2 (
f2147ad) — VRAM dropped 22.1→20.0 GB/card on dual-turbo via 6-sidecar retirement - llama.cpp: mainline + TurboQuant KV (PR #21089 gating; CUDA follow-on tracked)
📝 Documentation
- README — entry point with two-routes framing (vLLM dual = max TPS, llama.cpp single = max robustness/no-cliffs)
- HARDWARE.md — hardware truths, power-cap knee data, cross-rig efficiency charts
- BENCHMARKS.md — measured TPS / context / VRAM per config
- CLIFFS.md — every soft cliff we've found + how to detect/avoid
- SINGLE_CARD.md / DUAL_CARD.md / MULTI_CARD.md — topology-specific guidance
- AGENTS.md — Profile schema + Status enum for compose-file metadata
- FAQ.md — triage ladder + setup gotchas
💬 Community
- Discord — casual chat, hardware questions
- GitHub Discussions — async threads, cross-rig benchmark drops
- GitHub Issues — bug reports + concrete asks
Pinning to this release
git checkout v2026.05.09
When posting cross-rig benchmark numbers, please cite this version tag (or the commit SHA) so others can reproduce against the same script revision.
Full diff since init: https://github.com/noonghunna/club-3090/commits/v2026.05.09
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About noonghunna/club-3090
All releases →Beta — feedback welcome: [email protected]