Skip to content

noonghunna/club-3090

v0.8.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 10d Model Serving & MLOps
βœ“ No known CVEs patched
Read the diff β†’ Tool health β†’ What is this tool? β†’

✓ No known CVEs patched in this version

Summary

AI summary

Broad release touches πŸ“ Documentation, README, πŸ› Bug fixes, and ✨ Features.

Changes in this release

Feature Medium

Adds opt‑in --kv-breakdown architecture cache planning layer to kv-calc.

Adds opt‑in --kv-breakdown architecture cache planning layer to kv-calc.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Adds Structured‑CoT bounded‑thinking compose and grammar‑dialect fix to llama.cpp.

Adds Structured‑CoT bounded‑thinking compose and grammar‑dialect fix to llama.cpp.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Increases ik‑llama/two‑stage context default from 131072 to 200000 and promotes πŸ§ͺ→⭐ code option.

Increases ik‑llama/two‑stage context default from 131072 to 200000 and promotes πŸ§ͺ→⭐ code option.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Low

Adds profile‑backed model weight fetch registry.

Adds profile‑backed model weight fetch registry.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Deprecation Low

Retires the pre‑built vLLM image and updates related workflows and documentation.

Retires the pre‑built vLLM image and updates related workflows and documentation.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Fixes rebench‑report to surface ceiling VRAM margin in verify‑stress section.

Fixes rebench‑report to surface ceiling VRAM margin in verify‑stress section.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Makes shell environment variables win over .env (e.g., MODEL_DIR) and adds CRLF tolerance in switch.

Makes shell environment variables win over .env (e.g., MODEL_DIR) and adds CRLF tolerance in switch.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Reduces ik‑llama/two‑stage default ngram n_max from 64 to 4 for tuned code‑decode optimum.

Reduces ik‑llama/two‑stage default ngram n_max from 64 to 4 for tuned code‑decode optimum.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Scopes report.sh kv‑calc calibration to the running model.

Scopes report.sh kv‑calc calibration to the running model.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Sets distinct default container_name per llama.cpp and ik single variant.

Sets distinct default container_name per llama.cpp and ik single variant.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Full changelog

v0.8.5 β€” 2026-05-24

✨ Features

  • feat(llama.cpp): Structured-CoT bounded-thinking compose + grammar-dialect fix (#214) (#214 by @noonghunna)
  • feat(kv-calc): opt-in --kv-breakdown architecture cache planning layer (#213) (#213 by @noonghunna)
  • feat(ik-llama/two-stage): ctx default 131072β†’200000 + promote πŸ§ͺ→⭐ code option (#212) (#212 by @noonghunna)

πŸ› Bug fixes

  • fix(rebench-report): surface ceiling VRAM margin in verify-stress section (#184) (1055c91)
  • fix(switch): shell env wins over .env (MODEL_DIR etc.) + CRLF-tolerant (#425) (9a27de8)
  • fix(ik-llama/two-stage): default ngram n_max 64β†’4 (tuned code-decode optimum) (#210) (#210 by @noonghunna)
  • fix(#168): scope report.sh kv-calc calibration to the running model (477873a)
  • fix(#169): distinct default container_name per llama.cpp/ik single variant (9e5f200)

πŸ“ Documentation

  • docs(BENCHMARKS): 4090 ik-two-stage cross-rig row + 4090 ctx-derate note (#184) (703fa3c)
  • docs(README): direct docker-compose fallback when launch/switch error + default capture to --full (27e818f)
  • docs(BENCHMARKS): add thinking-on vs no-think code-gen baseline (Qwen3.6-27B) (d8adf55)
  • docs(README): Windows/WSL2 signpost at top of Quick start (f954692)
  • docs(WSL): add Diagnostics section β€” suggest pciutils, set lspci/WSL2 expectation (d36eb63)
  • docs(WSL,FAQ): clarify club-3090 needs WSL2 β€” native Windows = upstream engine only (6fa639b)
  • docs(README,WSL): fold reasoning suite into Benchmarks; add native llama.cpp + overhead-reduction to WSL guide (80527f9)
  • Document reasoning quality suite (605f1df)
  • docs: add WSL2/Windows from-scratch setup guide (#187) (3c1a6e9)
  • docs(README): add Benchmarks + Diagnostics sections (37574fe)
  • docs(diagnostics): redact internal paths in structured-cot-bench.md (3b270b9)
  • docs: promote iq4ks-two-stage πŸ§ͺ→⭐ (code, 200K) + BENCHMARKS row (7e6eaf8)
  • docs(SINGLE_CARD): add measured two-stage TPS (~59/~98, code +35% vs MTP-only) (b10096c)
  • docs(SINGLE_CARD): mark the #167-blocked vLLM configs in "Pick a config" (cb19d68)
  • docs(README): drop SGLang from the headline engine claims (blocked, not a route) (678fd00)
  • docs(README): single-card "recommended" β†’ llamacpp/default (was #167-blocked vllm/default) (44c08b6)
  • docs(DUAL_CARD): distinguish decode-concurrent vs long-prefill-overlap (#208) (36767f1)
  • docs(README): fix stale 'llama.cpp single = full 262K' β†’ 200K in supported-models cell (2a8357f)
  • docs+registry: surface ik-llama on the single-card front door + fix stale max_ctx (7f73361)
  • docs(BENCHMARKS): refresh llamacpp/mtp row to the 200K thinking-off rebench (2eca18d)
  • docs(#169 branch): fix stale 262Kβ†’200K cross-refs in single compose headers (62a0f4b)
  • docs: correct single-card llama.cpp/ik_llama ctx 262K -> 200K (shipped default) (c7fc9ca)

🧹 Maintenance

  • chore: retire the club-3090 pre-built vLLM image (workflow + docs) (7003e61)

🧹 Other

  • Add profile-backed model weight fetch registry (5f37ae6)
  • Generalize compose model preflight (99d264b)
  • Expose benchlocal reasoning suite (caf6fc2)

[Pin: git checkout v0.8.5] Β· Full diff

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track noonghunna/club-3090

Get notified when new releases ship.

Sign up free

About noonghunna/club-3090

All releases β†’

Related context

Earlier breaking changes

  • v0.8.7 Genesis vLLM composes deprecated; default to `vllm/minimal`.
  • v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.

Beta — feedback welcome: [email protected]