Skip to content

noonghunna/club-3090

v2026.05.09 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 25d Model Serving & MLOps
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Summary

AI summary

Broad release touches 🎯 Models supported, 📝 Documentation, 🛠️ Scripts + tooling, and 💬 Community.

Full changelog

This is the first tagged snapshot of club-3090. Future releases will be auto-categorized via Release Drafter; this initial release summarizes the major milestones since repo creation on 2026-04-28.

🎯 Models supported

  • Qwen3.6-27B (AutoRound INT4) — primary serving model on vLLM, single + dual + multi3/multi4 topologies
  • Qwen3.6-35B-A3B (MoE) — secondary serving path on llama.cpp single-card
  • Gemma 4 31B — added as supported model on dual-card via vLLM, with MTP drafter (google/gemma-4-31B-it-assistant)
  • Qwopus3.6-27B (preview) — INT4+BF16-MTP via Carnice AutoRound recipe (still WIP, not production)

📊 Cross-rig benchmark data

Power-cap efficiency curves contributed across hardware classes:

  • 3090 (air + water): 21-cap sweep, knee at 290W (air) / 330W (water)
  • 4090 (air): 38-cap sweep at 10W resolution from @laurimyllari, knee at 250-260W, firmware boost-clock plateau locked at SM 2610 MHz / 393W draw from cap=400W onwards
  • 5090 (air): 21-cap sweep from @apnar, knee at 400W for both decode + prefill workloads

See docs/HARDWARE.md for the full cross-rig knee table + efficiency-curve charts.

⚠️ Cliffs + gotchas documented

  • Cliff 1 (TurboQuant tool-prefill OOM at 25K+ tool messages on single-card 24 GB)
  • Cliff 2 (DeltaNet GDN forward OOM at 50-60K single prompt) — closed via Genesis v7.69 recipe + vllm#35975 backport
  • Cliff 2b (~21-26K accumulated context, single-card vLLM long-* configs) — route hermes/openhands users to dual.yml or llamacpp/default
  • TQ3 long-ctx vs fp8_e5m2 — TQ3 (3-bit) wins on memory but fp8 holds up better past ~30K single-prompt on bandwidth-constrained dual-eGPU rigs

🛠️ Scripts + tooling

  • scripts/setup.sh — one-shot model + compose installer
  • scripts/launch.sh — variant-aware boot
  • scripts/verify-full.sh / verify-stress.sh / verify-quick.sh — production gates
  • scripts/soak-continuous.sh — long-running stability check, the only test that catches Cliff 2b
  • scripts/power-cap-sweep.sh — cross-rig efficiency-knee sweeper with auto-plateau detection, SM/mem clock + throttle% + p-state sampling, decode-single + decode-concurrent + prefill-heavy modes

🔧 Pin baseline

  • vLLM nightly (dev205+ workspace-lock blocker UNBLOCKED 2026-05-01 PM)
  • Genesis pin: v7.72.2 (f2147ad) — VRAM dropped 22.1→20.0 GB/card on dual-turbo via 6-sidecar retirement
  • llama.cpp: mainline + TurboQuant KV (PR #21089 gating; CUDA follow-on tracked)

📝 Documentation

  • README — entry point with two-routes framing (vLLM dual = max TPS, llama.cpp single = max robustness/no-cliffs)
  • HARDWARE.md — hardware truths, power-cap knee data, cross-rig efficiency charts
  • BENCHMARKS.md — measured TPS / context / VRAM per config
  • CLIFFS.md — every soft cliff we've found + how to detect/avoid
  • SINGLE_CARD.md / DUAL_CARD.md / MULTI_CARD.md — topology-specific guidance
  • AGENTS.md — Profile schema + Status enum for compose-file metadata
  • FAQ.md — triage ladder + setup gotchas

💬 Community

Pinning to this release

git checkout v2026.05.09

When posting cross-rig benchmark numbers, please cite this version tag (or the commit SHA) so others can reproduce against the same script revision.


Full diff since init: https://github.com/noonghunna/club-3090/commits/v2026.05.09

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track noonghunna/club-3090

Get notified when new releases ship.

Sign up free

About noonghunna/club-3090

All releases →

Related context

Earlier breaking changes

  • v0.8.7 Genesis vLLM composes deprecated; default to `vllm/minimal`.
  • v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.

Beta — feedback welcome: [email protected]