This release adds 4 notable features for engineering teams evaluating rollout.
Published 16d
Model Serving & MLOps
β No known CVEs patched
✓ No known CVEs patched in this version
Summary
AI summaryBroad release touches π Documentation, β¨ Features, pull, and π Bug fixes.
Full changelog
v0.8.2 β Universal pull: honest failure on-ramp + a wider model catalogue
What this gives you
- An honest "will it fit?" answer for any safetensors model β
scripts/pull.sh <model> --profile-like <variant> --recommendreturns a plain verdict (FITS / FITS-but-needs-acceptance / DOES-NOT-FIT) that tracks the real gate, carries the boot-fitβ runtime caveat, and never silently passes. - Failed pulls become fixable signal, not dead-ends β every hard-block leaves a redacted, path-scrubbed diagnostic; one consented command (
scripts/pull.sh --submit-last, works with or withoutgh) sends it back so the gap can be closed. No telemetry, no auto-send. - ~12 more model architectures recognised out-of-the-box β Phi, Gemma/Gemma3, Starcoder2, Cohere/Command-R, Mixtral, Qwen2/3-MoE and more no longer need
--experimental-arch; genuine native models reach a clean serve verdict instead of a blanket refusal (per-repo remote-code still fail-closed β zero false-pass). - Optional non-NVIDIA hardware detection for the eval path (AMD ROCm / Apple), graceful-degrading β no change for NVIDIA users.
- Bundled: N-GPU NVLink auto-detection (multi-4 + Gemma-4-26B dual composes) and a documentation restructure (quick-start README,
GETTING_STARTED.md, FAQ).
Know before you use it
- Generated composes are a minimal, known-safe starting point: capacity (context/KV/mem-util) is the reference profile's, not auto-tuned to your GPU β run
--recommend(ortools/kv-calc.py --solve-max-ctx) for the real fit and tune${MAX_MODEL_LEN}. An opt-in capacity optimiser is planned for a later release. - Advanced features (MTP / TurboQuant / DFlash) stay in the pre-baked Genesis composes β generated composes are non-Genesis by design.
- GGUF / cross-engine generation is deferred to a separate Β§9 design-unlock β not in this release.
v0.8.2 β 2026-05-19
β¨ Features
- feat(pull): v0.8.2 STEP V5 β recommend UX + report-a-failed-pull doc + Β§9-reconciliation (c5b5e9b)
- feat(nvlink): auto-detect NVLink on N-GPU topologies; add detection to multi4 + gemma-4-26b dual (8f8ec1c)
- feat(pull): v0.8.2 STEP V4 β optional whichllm hw-detect subprocess (CONTRACT-3, hw-detect-only) (3917728)
- feat(switch): v0.8.2 STEP V3 β switch.sh β compose_registry parity (CONTRACT-2b-ii) (e6503bc)
- feat(pull): v0.8.2 STEP V3 β arch-registry expansion + chat-template attribution/drift_guard (999c93f)
- feat(pull): v0.8.2 STEP V2 β surface pointer + --submit-last/--submit (gh + gh-less, consented, F5 reuse) (e1cdcb5)
- feat(pull): v0.8.2 STEP V1 β capture-on-hard-block pt1-gate emitter + BaseCaptureBundle protocol lift (20f1557)
- feat(report): lspci PCIe/P2P diagnostics subsection (LnkSta/ACS/topology) (#148) (af2e45a)
π Bug fixes
- fix(pull): v0.8.2 STEP V5 β recommend must not label a fits-clean model "DOES NOT FIT" (26949d7)
- fix(pull): v0.8.2 STEP V3 β deliver CONTRACT-2's engine-supported broadening (TRC two-class) (d78b9a9)
- fix(pull): v0.8.2 STEP V2 β gh-less issue body must not carry the absolute capture path (52451ca)
- fix(launch): force LC_NUMERIC=C so the VRAM-budget printf survives comma-decimal locales (#159) (186dc93)
- fix(deriver): correct stale "GGUF not supported until v0.8.1" message β now misleading post-v0.8.1-ship (344ab87)
π Documentation
- docs(architecture): bring current-state docs up to v0.8.2 (recommend / submit on-ramp / arch-registry / hwdetect) (c5c8f46)
- docs(generator): state plainly that generated-compose capacity is the reference profile's, NOT fit-adapted (247b1dc)
- docs(pull): v0.8.2 STEP V6 β correct Β§9/headline to the true bundled release scope (b791271)
- docs: fix duplicate MULTI_CARD.md entry in docs index (966a8d1)
- docs: reorder docsindex (GSD first), add FAQ TOC + promote troubleshooting ladder, add tool-calling example (a891b39)
- docs: add GETTING_STARTED.md, Gemma 4 model READMEs, restructure main README with quick start first (6368bae)
- docs: fix stale NVLINK_MODE comment, INTERNALS.md cliff status, and dead companion repo link (28bd0e8)
- docs(container-runtimes): Proxmox passthrough β NVLink is the fragile path, not Proxmox (#161) (3f066a0)
- docs(benchmarks): add @hlo-world dual-3090 PCIe x4 dual-dflash-noviz row (#158) (135f2c4)
- docs(upstream): froggeric v19 re-eval PASSED β ADOPTED (#150) (ec1fd65)
π§Ή Maintenance
- chore(chat-template): re-vendor latest froggeric Qwen3.6 template for re-eval (#150) (8a9ea6c)
[Pin: git checkout v0.8.2] Β· Full diff
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About noonghunna/club-3090
All releases βBeta — feedback welcome: [email protected]