Skip to content

noonghunna/club-3090

v0.8.4 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 11d Model Serving & MLOps
βœ“ No known CVEs patched
Read the diff β†’ Tool health β†’ What is this tool? β†’

✓ No known CVEs patched in this version

Summary

AI summary

Broad release touches πŸ“ Documentation, πŸ› Bug fixes, 🧹 Other, and ✨ Features.

Changes in this release

Feature Medium

Adds ik iq4ks-mtp and iq4ks-mtp-vision to launch.sh and switch.sh.

Adds ik iq4ks-mtp and iq4ks-mtp-vision to launch.sh and switch.sh.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Adds ik_llama Qwen3.6-27B IQ4_KS composes for text (262K) and vision (160K).

Adds ik_llama Qwen3.6-27B IQ4_KS composes for text (262K) and vision (160K).

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Exposes request‑level thinking toggles in eval.

Exposes request‑level thinking toggles in eval.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Exposes sampling defaults via environment variables in compose.

Exposes sampling defaults via environment variables in compose.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Sets WEIGHTS=gguf to fetch llama.cpp GGUF weights in setup.

Sets WEIGHTS=gguf to fetch llama.cpp GGUF weights in setup.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Medium

Passes --sampling-from-server through quality-test.sh and rebench-full.sh.

Passes --sampling-from-server through quality-test.sh and rebench-full.sh.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Feature Low

Capture prefill throughput during NIAH rungs in verify-stress.

Capture prefill throughput during NIAH rungs in verify-stress.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Medium

Fixes three live‑caught bugs in verify‑stress ceiling ladder.

Fixes three live‑caught bugs in verify‑stress ceiling ladder.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Adds CTX_SIZE‑scaled ceiling ladder to verify‑stress.

Adds CTX_SIZE‑scaled ceiling ladder to verify‑stress.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Recognizes llama‑cpp and ik‑llama containers in soak and preflight autodetect.

Recognizes llama‑cpp and ik‑llama containers in soak and preflight autodetect.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Medium

Lowers single‑card MTP CTX_SIZE default from 262144 to 200000 for llama.cpp and ik_llama.

Lowers single‑card MTP CTX_SIZE default from 262144 to 200000 for llama.cpp and ik_llama.

Source: llm_adapter@2026-05-28

Confidence: high

β€”
Bugfix Low

Fix basename model ID handling in rebench aider/litellm step.

Fix basename model ID handling in rebench aider/litellm step.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Low

Record measured ceiling‑ladder result for ik iq4ks-mtp header in compose.

Record measured ceiling‑ladder result for ik iq4ks-mtp header in compose.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Low

Polish report handling of PyYAML, idle VRAM, P2P redaction, and kv-calc.

Polish report handling of PyYAML, idle VRAM, P2P redaction, and kv-calc.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Low

Guide users to MODEL_DIR/.env when weights are not found in launch script.

Guide users to MODEL_DIR/.env when weights are not found in launch script.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Low

Pin llama‑cpp Docker image to server‑cuda‑b9246 tag.

Pin llama‑cpp Docker image to server‑cuda‑b9246 tag.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Low

Change single‑card default suggestion in launch script to llamacpp/default.

Change single‑card default suggestion in launch script to llamacpp/default.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Bugfix Low

Always capture sandboxed-pack logs to per‑tag results directory in rebench.

Always capture sandboxed-pack logs to per‑tag results directory in rebench.

Source: granite4.1:30b@2026-05-28-audit

Confidence: high

β€”
Full changelog

v0.8.4 β€” 2026-05-23

✨ Features

  • feat(verify-stress): capture prefill throughput during NIAH rungs (#199) (07d478c)
  • feat(eval): expose request-level thinking toggles (#196) (#196 by @noonghunna)
  • feat(scripts): pass --sampling-from-server through quality-test.sh + rebench-full.sh (dd1f070)
  • feat(compose): expose sampling defaults via env (#194) (#194 by @noonghunna)
  • feat(setup): WEIGHTS=gguf to fetch the llama.cpp GGUF (not just the vLLM model) (#191) (#191 by @noonghunna)
  • feat(ik-llama): wire iq4ks-mtp + iq4ks-mtp-vision into launch.sh + switch.sh (#189) (#189 by @noonghunna)
  • feat(models): add ik_llama Qwen3.6-27B IQ4_KS composes β€” text 262K + vision 160K (#180) (#180 by @noonghunna)

πŸ› Bug fixes

  • fix(rebench): basename model id for the aider/litellm step (ik_llama full-path id β†’ 0/30) (3b20ce3)
  • fix(soak,preflight): recognize llama-cpp / ik-llama containers in autodetect (#403) (d9fdab2)
  • fix(compose): ik iq4ks-mtp header β€” record measured ceiling-ladder result (200K confirmed) (1d93343)
  • fix(compose): lower single-card MTP CTX_SIZE default 262144 β†’ 200000 (llama.cpp + ik_llama) (2e45928)
  • fix(verify-stress): three live-caught bugs in ceiling ladder (#199) (b84249c)
  • fix(verify-stress): add CTX_SIZE-scaled ceiling ladder (#199) (5a825a4)
  • fix(report): PyYAML/idle-VRAM/P2P/redaction/kv-calc polish + review fixes (#178/#137) (#192 by @noonghunna)
  • fix(launch): point users at MODEL_DIR/.env when weights aren't found (#190) (#190 by @noonghunna)
  • fix(llamacpp): pin image to server-cuda-b9246 (rolling tag broke at b9282) (#188) (#188 by @noonghunna)
  • fix(launch): single-card default suggestion β†’ llamacpp/default (#185) (#185 by @noonghunna)
  • fix(rebench): always capture sandboxed-pack logs to the per-tag results dir (#179) (#179 by @noonghunna)

πŸ“ Documentation

  • docs: correct ik_llama verdict β€” ~18-20% FASTER than mainline, not a "tie" (#184) (b7353da)
  • docs: add @mgabor3141 X399/TR-1950X dual.yml row + pre-Zen2 CPU-IPC note (#178) (6e49960)
  • docs(CLIFFS): document llama.cpp "boots β‰  fills" false ceiling; 200K = max-safe single-card CTX_SIZE (9be237d)
  • docs: QUALITY_TEST.md β€” fix stale pack-status (sandboxed packs now implemented) (f6bdc06)
  • docs: document sampling/temperature eval options (#193/#194 + benchlocal #19/#21) (9fd634a)
  • docs(single-card): strike Genesis-pinned vLLM rows (blocked by purged pin #167) (a30bdfd)
  • docs(upstream): correct the #40875 row (open tool-call-corruption bug, not "closed coexistence") (25f130a)
  • docs: correct ik_llama claims to the matched-power tie (#184) (c470d9a)
  • docs: surface WEIGHTS=gguf + switch.sh ik-llama paths (match #189/#191) (412315d)
  • docs(HARDWARE/FAQ): AMD-Vi IOMMU Xid 154 under TP=2 β†’ iommu=pt fix (#178) (fe86b72)
  • docs: add ik_llama engine page + QUANTIZATION primer; surface IQK quants (554b85b)
  • docs(BENCHMARKS): @duart dual NVLink Proxmox VFIO-passthrough, stock-upstream no-Genesis (disc #162) (bc6e20b)
  • docs(BENCHMARKS): @mgabor3141 dual.yml β€” Z77/i7-3770K, PCIe 2.0 x4 slowest cross-card link (#178) (626fa68)
  • docs(mtp-vision): surface the -ub 512 β†’ 192K context recipe in the compose header (70bf7e7)
  • docs: cross-link the -ub vs ctx trade-off into SINGLE_CARD + CLIFFS + FAQ (035261b)
  • charts: compose names on x-axis + description legend block below (07c7cd0)
  • charts: tighten single-card label format (line 1 = variant + ctx, line 2 = modifier) (9aa8fa7)

πŸ› οΈ Scripts + tooling

  • scripts: endpoint-first --url/--model/--engine for non-Docker engines (#174) (#174 by @noonghunna)
  • report.sh: capture image digest + OCI labels (build tag, upstream commit) (78556f8)

🧹 Maintenance

  • chore(compose): drop accidentally-committed qwopus3.6-27b-v2 llama.cpp compose (b8aeb93)
  • refactor(llamacpp): collapse single-card composes 3β†’2 (default = mtp alias) (#181) (#181 by @noonghunna)

🧹 Other

  • Fix verify-full to accept reasoning_content (3a04ae5)
  • quality-test: respect explicit MODEL/--model, don't clobber from /v1/models (#177) (#177 by @noonghunna)
  • sglang: park EAGLE-3 path for Qwen3-Next (MTP wins everywhere) (#176) (#176 by @noonghunna)
  • quality-test: expose --timeout-per-case + bump aider-polyglot-30 to 3600s (#175) (#175 by @noonghunna)
  • sglang: experimental EAGLE-3 + Qwen3-Next dual-3090 path (Codex-led patch) (941fa06)
  • SINGLE_CARD: refresh Luce DFlash + PFlash watch-list (2026-05-20) (f9f9640)
  • AGENTS: pin engine images only when we vendor patches (6810768)
  • llama-cpp: document speed-vs-context trade-off + fix stale ub default (1b2a76c)
  • llama-cpp: switch to rolling :server-cuda tag (no patches β†’ no pin needed) (4a53eda)
  • llama-cpp: replace orphan llama-cpp:local with upstream pinned image (#170) (c3e7c7e)
  • gpu-mode status: probe :8020 + detect engine on :8030 (db9c5e1)

[Pin: git checkout v0.8.4] Β· Full diff

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track noonghunna/club-3090

Get notified when new releases ship.

Sign up free

About noonghunna/club-3090

All releases β†’

Related context

Earlier breaking changes

  • v0.8.7 Genesis vLLM composes deprecated; default to `vllm/minimal`.
  • v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.

Beta — feedback welcome: [email protected]