This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Summary
AI summaryBroad release touches π Documentation, README, π Bug fixes, and β¨ Features.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds optβin --kv-breakdown architecture cache planning layer to kv-calc. Adds optβin --kv-breakdown architecture cache planning layer to kv-calc. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Adds StructuredβCoT boundedβthinking compose and grammarβdialect fix to llama.cpp. Adds StructuredβCoT boundedβthinking compose and grammarβdialect fix to llama.cpp. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Increases ikβllama/twoβstage context default from 131072 to 200000 and promotes π§ͺββ code option. Increases ikβllama/twoβstage context default from 131072 to 200000 and promotes π§ͺββ code option. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Low |
Adds profileβbacked model weight fetch registry. Adds profileβbacked model weight fetch registry. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Deprecation | Low |
Retires the preβbuilt vLLM image and updates related workflows and documentation. Retires the preβbuilt vLLM image and updates related workflows and documentation. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Fixes rebenchβreport to surface ceiling VRAM margin in verifyβstress section. Fixes rebenchβreport to surface ceiling VRAM margin in verifyβstress section. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Makes shell environment variables win over .env (e.g., MODEL_DIR) and adds CRLF tolerance in switch. Makes shell environment variables win over .env (e.g., MODEL_DIR) and adds CRLF tolerance in switch. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Reduces ikβllama/twoβstage default ngram n_max from 64 to 4 for tuned codeβdecode optimum. Reduces ikβllama/twoβstage default ngram n_max from 64 to 4 for tuned codeβdecode optimum. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Scopes report.sh kvβcalc calibration to the running model. Scopes report.sh kvβcalc calibration to the running model. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Sets distinct default container_name per llama.cpp and ik single variant. Sets distinct default container_name per llama.cpp and ik single variant. Source: llm_adapter@2026-05-28 Confidence: high |
β |
Full changelog
v0.8.5 β 2026-05-24
β¨ Features
- feat(llama.cpp): Structured-CoT bounded-thinking compose + grammar-dialect fix (#214) (#214 by @noonghunna)
- feat(kv-calc): opt-in --kv-breakdown architecture cache planning layer (#213) (#213 by @noonghunna)
- feat(ik-llama/two-stage): ctx default 131072β200000 + promote π§ͺββ code option (#212) (#212 by @noonghunna)
π Bug fixes
- fix(rebench-report): surface ceiling VRAM margin in verify-stress section (#184) (1055c91)
- fix(switch): shell env wins over .env (MODEL_DIR etc.) + CRLF-tolerant (#425) (9a27de8)
- fix(ik-llama/two-stage): default ngram n_max 64β4 (tuned code-decode optimum) (#210) (#210 by @noonghunna)
- fix(#168): scope report.sh kv-calc calibration to the running model (477873a)
- fix(#169): distinct default container_name per llama.cpp/ik single variant (9e5f200)
π Documentation
- docs(BENCHMARKS): 4090 ik-two-stage cross-rig row + 4090 ctx-derate note (#184) (703fa3c)
- docs(README): direct docker-compose fallback when launch/switch error + default capture to --full (27e818f)
- docs(BENCHMARKS): add thinking-on vs no-think code-gen baseline (Qwen3.6-27B) (d8adf55)
- docs(README): Windows/WSL2 signpost at top of Quick start (f954692)
- docs(WSL): add Diagnostics section β suggest pciutils, set lspci/WSL2 expectation (d36eb63)
- docs(WSL,FAQ): clarify club-3090 needs WSL2 β native Windows = upstream engine only (6fa639b)
- docs(README,WSL): fold reasoning suite into Benchmarks; add native llama.cpp + overhead-reduction to WSL guide (80527f9)
- Document reasoning quality suite (605f1df)
- docs: add WSL2/Windows from-scratch setup guide (#187) (3c1a6e9)
- docs(README): add Benchmarks + Diagnostics sections (37574fe)
- docs(diagnostics): redact internal paths in structured-cot-bench.md (3b270b9)
- docs: promote iq4ks-two-stage π§ͺββ (code, 200K) + BENCHMARKS row (7e6eaf8)
- docs(SINGLE_CARD): add measured two-stage TPS (~59/~98, code +35% vs MTP-only) (b10096c)
- docs(SINGLE_CARD): mark the #167-blocked vLLM configs in "Pick a config" (cb19d68)
- docs(README): drop SGLang from the headline engine claims (blocked, not a route) (678fd00)
- docs(README): single-card "recommended" β llamacpp/default (was #167-blocked vllm/default) (44c08b6)
- docs(DUAL_CARD): distinguish decode-concurrent vs long-prefill-overlap (#208) (36767f1)
- docs(README): fix stale 'llama.cpp single = full 262K' β 200K in supported-models cell (2a8357f)
- docs+registry: surface ik-llama on the single-card front door + fix stale max_ctx (7f73361)
- docs(BENCHMARKS): refresh llamacpp/mtp row to the 200K thinking-off rebench (2eca18d)
- docs(#169 branch): fix stale 262Kβ200K cross-refs in single compose headers (62a0f4b)
- docs: correct single-card llama.cpp/ik_llama ctx 262K -> 200K (shipped default) (c7fc9ca)
π§Ή Maintenance
- chore: retire the club-3090 pre-built vLLM image (workflow + docs) (7003e61)
π§Ή Other
- Add profile-backed model weight fetch registry (5f37ae6)
- Generalize compose model preflight (99d264b)
- Expose benchlocal reasoning suite (caf6fc2)
[Pin: git checkout v0.8.5] Β· Full diff
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About noonghunna/club-3090
All releases βBeta — feedback welcome: [email protected]