noonghunna/club-3090

v2026.05.09 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 25d Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Summary

AI summary

Broad release touches 🎯 Models supported, 📝 Documentation, 🛠️ Scripts + tooling, and 💬 Community.

Full changelog

This is the first tagged snapshot of club-3090. Future releases will be auto-categorized via Release Drafter; this initial release summarizes the major milestones since repo creation on 2026-04-28.

🎯 Models supported

Qwen3.6-27B (AutoRound INT4) — primary serving model on vLLM, single + dual + multi3/multi4 topologies
Qwen3.6-35B-A3B (MoE) — secondary serving path on llama.cpp single-card
Gemma 4 31B — added as supported model on dual-card via vLLM, with MTP drafter (google/gemma-4-31B-it-assistant)
Qwopus3.6-27B (preview) — INT4+BF16-MTP via Carnice AutoRound recipe (still WIP, not production)

📊 Cross-rig benchmark data

Power-cap efficiency curves contributed across hardware classes:

3090 (air + water): 21-cap sweep, knee at 290W (air) / 330W (water)
4090 (air): 38-cap sweep at 10W resolution from @laurimyllari, knee at 250-260W, firmware boost-clock plateau locked at SM 2610 MHz / 393W draw from cap=400W onwards
5090 (air): 21-cap sweep from @apnar, knee at 400W for both decode + prefill workloads

See docs/HARDWARE.md for the full cross-rig knee table + efficiency-curve charts.

⚠️ Cliffs + gotchas documented

Cliff 1 (TurboQuant tool-prefill OOM at 25K+ tool messages on single-card 24 GB)
Cliff 2 (DeltaNet GDN forward OOM at 50-60K single prompt) — closed via Genesis v7.69 recipe + vllm#35975 backport
Cliff 2b (~21-26K accumulated context, single-card vLLM long-* configs) — route hermes/openhands users to dual.yml or llamacpp/default
TQ3 long-ctx vs fp8_e5m2 — TQ3 (3-bit) wins on memory but fp8 holds up better past ~30K single-prompt on bandwidth-constrained dual-eGPU rigs

🛠️ Scripts + tooling

scripts/setup.sh — one-shot model + compose installer
scripts/launch.sh — variant-aware boot
scripts/verify-full.sh / verify-stress.sh / verify-quick.sh — production gates
scripts/soak-continuous.sh — long-running stability check, the only test that catches Cliff 2b
scripts/power-cap-sweep.sh — cross-rig efficiency-knee sweeper with auto-plateau detection, SM/mem clock + throttle% + p-state sampling, decode-single + decode-concurrent + prefill-heavy modes

🔧 Pin baseline

vLLM nightly (dev205+ workspace-lock blocker UNBLOCKED 2026-05-01 PM)
Genesis pin: v7.72.2 (f2147ad) — VRAM dropped 22.1→20.0 GB/card on dual-turbo via 6-sidecar retirement
llama.cpp: mainline + TurboQuant KV (PR #21089 gating; CUDA follow-on tracked)

📝 Documentation

README — entry point with two-routes framing (vLLM dual = max TPS, llama.cpp single = max robustness/no-cliffs)
HARDWARE.md — hardware truths, power-cap knee data, cross-rig efficiency charts
BENCHMARKS.md — measured TPS / context / VRAM per config
CLIFFS.md — every soft cliff we've found + how to detect/avoid
SINGLE_CARD.md / DUAL_CARD.md / MULTI_CARD.md — topology-specific guidance
AGENTS.md — Profile schema + Status enum for compose-file metadata
FAQ.md — triage ladder + setup gotchas

💬 Community

Discord — casual chat, hardware questions
GitHub Discussions — async threads, cross-rig benchmark drops
GitHub Issues — bug reports + concrete asks

Pinning to this release

git checkout v2026.05.09

When posting cross-rig benchmark numbers, please cite this version tag (or the commit SHA) so others can reproduce against the same script revision.

Full diff since init: https://github.com/noonghunna/club-3090/commits/v2026.05.09

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track noonghunna/club-3090

Get notified when new releases ship.

About noonghunna/club-3090

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.7 Genesis vLLM composes deprecated; default to `vllm/minimal`.
v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.