This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Summary
AI summaryBroad release touches π Documentation, π Bug fixes, π§Ή Other, and β¨ Features.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds ik iq4ks-mtp and iq4ks-mtp-vision to launch.sh and switch.sh. Adds ik iq4ks-mtp and iq4ks-mtp-vision to launch.sh and switch.sh. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Adds ik_llama Qwen3.6-27B IQ4_KS composes for text (262K) and vision (160K). Adds ik_llama Qwen3.6-27B IQ4_KS composes for text (262K) and vision (160K). Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Exposes requestβlevel thinking toggles in eval. Exposes requestβlevel thinking toggles in eval. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Exposes sampling defaults via environment variables in compose. Exposes sampling defaults via environment variables in compose. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Sets WEIGHTS=gguf to fetch llama.cpp GGUF weights in setup. Sets WEIGHTS=gguf to fetch llama.cpp GGUF weights in setup. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Medium |
Passes --sampling-from-server through quality-test.sh and rebench-full.sh. Passes --sampling-from-server through quality-test.sh and rebench-full.sh. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Feature | Low |
Capture prefill throughput during NIAH rungs in verify-stress. Capture prefill throughput during NIAH rungs in verify-stress. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Medium |
Fixes three liveβcaught bugs in verifyβstress ceiling ladder. Fixes three liveβcaught bugs in verifyβstress ceiling ladder. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Adds CTX_SIZEβscaled ceiling ladder to verifyβstress. Adds CTX_SIZEβscaled ceiling ladder to verifyβstress. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Recognizes llamaβcpp and ikβllama containers in soak and preflight autodetect. Recognizes llamaβcpp and ikβllama containers in soak and preflight autodetect. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Medium |
Lowers singleβcard MTP CTX_SIZE default from 262144 to 200000 for llama.cpp and ik_llama. Lowers singleβcard MTP CTX_SIZE default from 262144 to 200000 for llama.cpp and ik_llama. Source: llm_adapter@2026-05-28 Confidence: high |
β |
| Bugfix | Low |
Fix basename model ID handling in rebench aider/litellm step. Fix basename model ID handling in rebench aider/litellm step. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Low |
Record measured ceilingβladder result for ik iq4ks-mtp header in compose. Record measured ceilingβladder result for ik iq4ks-mtp header in compose. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Low |
Polish report handling of PyYAML, idle VRAM, P2P redaction, and kv-calc. Polish report handling of PyYAML, idle VRAM, P2P redaction, and kv-calc. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Low |
Guide users to MODEL_DIR/.env when weights are not found in launch script. Guide users to MODEL_DIR/.env when weights are not found in launch script. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Low |
Pin llamaβcpp Docker image to serverβcudaβb9246 tag. Pin llamaβcpp Docker image to serverβcudaβb9246 tag. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Low |
Change singleβcard default suggestion in launch script to llamacpp/default. Change singleβcard default suggestion in launch script to llamacpp/default. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
| Bugfix | Low |
Always capture sandboxed-pack logs to perβtag results directory in rebench. Always capture sandboxed-pack logs to perβtag results directory in rebench. Source: granite4.1:30b@2026-05-28-audit Confidence: high |
β |
Full changelog
v0.8.4 β 2026-05-23
β¨ Features
- feat(verify-stress): capture prefill throughput during NIAH rungs (#199) (07d478c)
- feat(eval): expose request-level thinking toggles (#196) (#196 by @noonghunna)
- feat(scripts): pass --sampling-from-server through quality-test.sh + rebench-full.sh (dd1f070)
- feat(compose): expose sampling defaults via env (#194) (#194 by @noonghunna)
- feat(setup): WEIGHTS=gguf to fetch the llama.cpp GGUF (not just the vLLM model) (#191) (#191 by @noonghunna)
- feat(ik-llama): wire iq4ks-mtp + iq4ks-mtp-vision into launch.sh + switch.sh (#189) (#189 by @noonghunna)
- feat(models): add ik_llama Qwen3.6-27B IQ4_KS composes β text 262K + vision 160K (#180) (#180 by @noonghunna)
π Bug fixes
- fix(rebench): basename model id for the aider/litellm step (ik_llama full-path id β 0/30) (3b20ce3)
- fix(soak,preflight): recognize llama-cpp / ik-llama containers in autodetect (#403) (d9fdab2)
- fix(compose): ik iq4ks-mtp header β record measured ceiling-ladder result (200K confirmed) (1d93343)
- fix(compose): lower single-card MTP CTX_SIZE default 262144 β 200000 (llama.cpp + ik_llama) (2e45928)
- fix(verify-stress): three live-caught bugs in ceiling ladder (#199) (b84249c)
- fix(verify-stress): add CTX_SIZE-scaled ceiling ladder (#199) (5a825a4)
- fix(report): PyYAML/idle-VRAM/P2P/redaction/kv-calc polish + review fixes (#178/#137) (#192 by @noonghunna)
- fix(launch): point users at MODEL_DIR/.env when weights aren't found (#190) (#190 by @noonghunna)
- fix(llamacpp): pin image to server-cuda-b9246 (rolling tag broke at b9282) (#188) (#188 by @noonghunna)
- fix(launch): single-card default suggestion β llamacpp/default (#185) (#185 by @noonghunna)
- fix(rebench): always capture sandboxed-pack logs to the per-tag results dir (#179) (#179 by @noonghunna)
π Documentation
- docs: correct ik_llama verdict β ~18-20% FASTER than mainline, not a "tie" (#184) (b7353da)
- docs: add @mgabor3141 X399/TR-1950X dual.yml row + pre-Zen2 CPU-IPC note (#178) (6e49960)
- docs(CLIFFS): document llama.cpp "boots β fills" false ceiling; 200K = max-safe single-card CTX_SIZE (9be237d)
- docs: QUALITY_TEST.md β fix stale pack-status (sandboxed packs now implemented) (f6bdc06)
- docs: document sampling/temperature eval options (#193/#194 + benchlocal #19/#21) (9fd634a)
- docs(single-card): strike Genesis-pinned vLLM rows (blocked by purged pin #167) (a30bdfd)
- docs(upstream): correct the #40875 row (open tool-call-corruption bug, not "closed coexistence") (25f130a)
- docs: correct ik_llama claims to the matched-power tie (#184) (c470d9a)
- docs: surface WEIGHTS=gguf + switch.sh ik-llama paths (match #189/#191) (412315d)
- docs(HARDWARE/FAQ): AMD-Vi IOMMU Xid 154 under TP=2 β iommu=pt fix (#178) (fe86b72)
- docs: add ik_llama engine page + QUANTIZATION primer; surface IQK quants (554b85b)
- docs(BENCHMARKS): @duart dual NVLink Proxmox VFIO-passthrough, stock-upstream no-Genesis (disc #162) (bc6e20b)
- docs(BENCHMARKS): @mgabor3141 dual.yml β Z77/i7-3770K, PCIe 2.0 x4 slowest cross-card link (#178) (626fa68)
- docs(mtp-vision): surface the -ub 512 β 192K context recipe in the compose header (70bf7e7)
- docs: cross-link the -ub vs ctx trade-off into SINGLE_CARD + CLIFFS + FAQ (035261b)
- charts: compose names on x-axis + description legend block below (07c7cd0)
- charts: tighten single-card label format (line 1 = variant + ctx, line 2 = modifier) (9aa8fa7)
π οΈ Scripts + tooling
- scripts: endpoint-first --url/--model/--engine for non-Docker engines (#174) (#174 by @noonghunna)
- report.sh: capture image digest + OCI labels (build tag, upstream commit) (78556f8)
π§Ή Maintenance
- chore(compose): drop accidentally-committed qwopus3.6-27b-v2 llama.cpp compose (b8aeb93)
- refactor(llamacpp): collapse single-card composes 3β2 (default = mtp alias) (#181) (#181 by @noonghunna)
π§Ή Other
- Fix verify-full to accept reasoning_content (3a04ae5)
- quality-test: respect explicit MODEL/--model, don't clobber from /v1/models (#177) (#177 by @noonghunna)
- sglang: park EAGLE-3 path for Qwen3-Next (MTP wins everywhere) (#176) (#176 by @noonghunna)
- quality-test: expose --timeout-per-case + bump aider-polyglot-30 to 3600s (#175) (#175 by @noonghunna)
- sglang: experimental EAGLE-3 + Qwen3-Next dual-3090 path (Codex-led patch) (941fa06)
- SINGLE_CARD: refresh Luce DFlash + PFlash watch-list (2026-05-20) (f9f9640)
- AGENTS: pin engine images only when we vendor patches (6810768)
- llama-cpp: document speed-vs-context trade-off + fix stale ub default (1b2a76c)
- llama-cpp: switch to rolling :server-cuda tag (no patches β no pin needed) (4a53eda)
- llama-cpp: replace orphan llama-cpp:local with upstream pinned image (#170) (c3e7c7e)
- gpu-mode status: probe :8020 + detect engine on :8030 (db9c5e1)
[Pin: git checkout v0.8.4] Β· Full diff
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About noonghunna/club-3090
All releases βBeta — feedback welcome: [email protected]