This release adds 5 notable features for engineering teams evaluating rollout.
Published 19d
Model Serving & MLOps
โ No known CVEs patched
✓ No known CVEs patched in this version
Summary
AI summaryBroad release touches ๐ Documentation, โจ Features, kv-math, and ๐ Bug fixes.
Full changelog
v0.7.3 โ 2026-05-15
โจ Features
- feat(report): surface kv-calc calibration verdict (#143 by @noonghunna)
- feat(kv-calc): model v0.7.3 MoE architectures (39e1873)
- feat(gemma-4-26b-a4b): AWQ + MTP n=4 โ +12% narr / +49% code over no-MTP baseline (6dc9a0d)
- feat(qwen-35b-a3b): preview-MTP compose + bench row โ MTP measured SLOWER on MoE (e1d44bd)
- feat(vllm-pr41800): vendor truncate_prompt_tokens overlay across all pre-fix engines (closes #139) (1d7aad1)
- feat(gemma-4-26b-a4b): AWQ path via vLLM PR #40886 overlay (0053444)
- feat(estate): add parallel boot mode (99328b4)
- feat(moe): add dual-card composes for Gemma 26B-A4B + Qwen 35B-A3B preview (2d1b1dc)
- feat(moe): wire Gemma 4 26B-A4B + Qwen 3.6 35B-A3B composes through fits() (f7f6f44)
- feat(profiles): split engine-pin policy by Genesis dependency (15eda8a)
- feat(profiles): add Gemma 4 26B-A4B ModelProfile + num_global_kv_heads field (abf0e32)
- feat(profiles): add Qwen 3.6 35B-A3B ModelProfile (MoE schema extensions) (9378714)
๐ Bug fixes
- fix(gpu-mode): mode_off tears down estate-managed instances (9cd854d)
- fix(qwen3.6-27b): route non-TQ3 composes to vllm-nightly-clean (3b2d940)
- fix(gemma-4-31b): route default/bf16 composes to vllm-nightly-clean (cf0451a)
- fix(engines): vllm-nightly-mtp anchors to 01d4d1ad (Sander v7.72.2 PROD pin) (87f0a0c)
๐ Documentation
- docs(soak-test): clarify PASS verdict semantics โ closes #140 (9a039d8)
- docs(UPSTREAM): add PR #41800 truncate_prompt_tokens row (273c017)
- docs(README): add v0.7.3 MoE models to Supported Models table (e49c939)
- docs(BENCHMARKS): Gemma 4 26B-A4B AWQ first row + AutoRound row demoted (92b69bd)
- docs(BENCHMARKS): add v0.7.3 MoE preview section (bdfb939)
- docs(HARDWARE): add note on PCIe Gen 3 + older CPU TP=2 headwind (8cf38b0)
- docs(KERNEL_MATRIX): add KV Cache Impact subsection (1a233cd)
- docs: add KERNEL_MATRIX.md (attention backend + engine support matrix) (97195fe)
- docs(kv-math): extend k_v_tensors=N notation to sliding-KV formulas (b1c68b4)
- docs(kv-math): tighten k_v_tensors notation across all 4 formulas (54d8bf0)
- docs(kv-math): third-pass Grok polish (67de3ec)
- docs(kv-math): second-pass Grok polish (78f94ca)
- docs(kv-math): address Grok review feedback (3114399)
- docs(kv-math): config-verify Qwen 35B-A3B + Gemma 26B-A4B MoE sections (6ec6a67)
๐งน Maintenance
- test(launch): align engine pin expectations (127f4f6)
[Pin: git checkout v0.7.3] ยท Full diff
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About noonghunna/club-3090
All releases โBeta — feedback welcome: [email protected]