noonghunna/club-3090

v0.8.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 10d Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Summary

AI summary

Broad release touches 📝 Documentation, README, 🐛 Bug fixes, and ✨ Features.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Adds opt‑in --kv-breakdown architecture cache planning layer to kv-calc. Adds opt‑in --kv-breakdown architecture cache planning layer to kv-calc. Source: llm_adapter@2026-05-28 Confidence: high	—
Feature	Medium	Adds Structured‑CoT bounded‑thinking compose and grammar‑dialect fix to llama.cpp. Adds Structured‑CoT bounded‑thinking compose and grammar‑dialect fix to llama.cpp. Source: llm_adapter@2026-05-28 Confidence: high	—
Feature	Medium	Increases ik‑llama/two‑stage context default from 131072 to 200000 and promotes 🧪→⭐ code option. Increases ik‑llama/two‑stage context default from 131072 to 200000 and promotes 🧪→⭐ code option. Source: llm_adapter@2026-05-28 Confidence: high	—
Feature	Low	Adds profile‑backed model weight fetch registry. Adds profile‑backed model weight fetch registry. Source: llm_adapter@2026-05-28 Confidence: high	—
Deprecation	Low	Retires the pre‑built vLLM image and updates related workflows and documentation. Retires the pre‑built vLLM image and updates related workflows and documentation. Source: llm_adapter@2026-05-28 Confidence: high	—
Bugfix
Bugfix	Medium	Fixes rebench‑report to surface ceiling VRAM margin in verify‑stress section. Fixes rebench‑report to surface ceiling VRAM margin in verify‑stress section. Source: llm_adapter@2026-05-28 Confidence: high	—
Bugfix	Medium	Makes shell environment variables win over .env (e.g., MODEL_DIR) and adds CRLF tolerance in switch. Makes shell environment variables win over .env (e.g., MODEL_DIR) and adds CRLF tolerance in switch. Source: llm_adapter@2026-05-28 Confidence: high	—
Bugfix	Medium	Reduces ik‑llama/two‑stage default ngram n_max from 64 to 4 for tuned code‑decode optimum. Reduces ik‑llama/two‑stage default ngram n_max from 64 to 4 for tuned code‑decode optimum. Source: llm_adapter@2026-05-28 Confidence: high	—
Bugfix	Medium	Scopes report.sh kv‑calc calibration to the running model. Scopes report.sh kv‑calc calibration to the running model. Source: llm_adapter@2026-05-28 Confidence: high	—
Bugfix	Medium	Sets distinct default container_name per llama.cpp and ik single variant. Sets distinct default container_name per llama.cpp and ik single variant. Source: llm_adapter@2026-05-28 Confidence: high	—

Full changelog

v0.8.5 — 2026-05-24

✨ Features

feat(llama.cpp): Structured-CoT bounded-thinking compose + grammar-dialect fix (#214) (#214 by @noonghunna)
feat(kv-calc): opt-in --kv-breakdown architecture cache planning layer (#213) (#213 by @noonghunna)
feat(ik-llama/two-stage): ctx default 131072→200000 + promote 🧪→⭐ code option (#212) (#212 by @noonghunna)

🐛 Bug fixes

fix(rebench-report): surface ceiling VRAM margin in verify-stress section (#184) (1055c91)
fix(switch): shell env wins over .env (MODEL_DIR etc.) + CRLF-tolerant (#425) (9a27de8)
fix(ik-llama/two-stage): default ngram n_max 64→4 (tuned code-decode optimum) (#210) (#210 by @noonghunna)
fix(#168): scope report.sh kv-calc calibration to the running model (477873a)
fix(#169): distinct default container_name per llama.cpp/ik single variant (9e5f200)

📝 Documentation

docs(BENCHMARKS): 4090 ik-two-stage cross-rig row + 4090 ctx-derate note (#184) (703fa3c)
docs(README): direct docker-compose fallback when launch/switch error + default capture to --full (27e818f)
docs(BENCHMARKS): add thinking-on vs no-think code-gen baseline (Qwen3.6-27B) (d8adf55)
docs(README): Windows/WSL2 signpost at top of Quick start (f954692)
docs(WSL): add Diagnostics section — suggest pciutils, set lspci/WSL2 expectation (d36eb63)
docs(WSL,FAQ): clarify club-3090 needs WSL2 — native Windows = upstream engine only (6fa639b)
docs(README,WSL): fold reasoning suite into Benchmarks; add native llama.cpp + overhead-reduction to WSL guide (80527f9)
Document reasoning quality suite (605f1df)
docs: add WSL2/Windows from-scratch setup guide (#187) (3c1a6e9)
docs(README): add Benchmarks + Diagnostics sections (37574fe)
docs(diagnostics): redact internal paths in structured-cot-bench.md (3b270b9)
docs: promote iq4ks-two-stage 🧪→⭐ (code, 200K) + BENCHMARKS row (7e6eaf8)
docs(SINGLE_CARD): add measured two-stage TPS (~59/~98, code +35% vs MTP-only) (b10096c)
docs(SINGLE_CARD): mark the #167-blocked vLLM configs in "Pick a config" (cb19d68)
docs(README): drop SGLang from the headline engine claims (blocked, not a route) (678fd00)
docs(README): single-card "recommended" → llamacpp/default (was #167-blocked vllm/default) (44c08b6)
docs(DUAL_CARD): distinguish decode-concurrent vs long-prefill-overlap (#208) (36767f1)
docs(README): fix stale 'llama.cpp single = full 262K' → 200K in supported-models cell (2a8357f)
docs+registry: surface ik-llama on the single-card front door + fix stale max_ctx (7f73361)
docs(BENCHMARKS): refresh llamacpp/mtp row to the 200K thinking-off rebench (2eca18d)
docs(#169 branch): fix stale 262K→200K cross-refs in single compose headers (62a0f4b)
docs: correct single-card llama.cpp/ik_llama ctx 262K -> 200K (shipped default) (c7fc9ca)

🧹 Maintenance

chore: retire the club-3090 pre-built vLLM image (workflow + docs) (7003e61)

🧹 Other

Add profile-backed model weight fetch registry (5f37ae6)
Generalize compose model preflight (99d264b)
Expose benchlocal reasoning suite (caf6fc2)

[Pin: git checkout v0.8.5] · Full diff

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track noonghunna/club-3090

Get notified when new releases ship.

About noonghunna/club-3090

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.7 Genesis vLLM composes deprecated; default to `vllm/minimal`.
v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.