noonghunna/club-3090

v0.8.0 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 17d Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Affected surfaces

deps

Summary

AI summary

Broad release touches 🧹 Other, 📝 Documentation, Honest limits, and ⚠️ Cliffs, gotchas, regressions.

Full changelog

v0.8.0 — Universal `pull`: evaluate & serve any safetensors HF model

The headline: the stack no longer only serves a fixed curated list. Point it at any safetensors Hugging Face repo and it tells you — honestly, before you download anything — whether it'll run on your GPUs, and if so boots a working server for it.

What you can do now

# Will this model fit on my GPUs? (no download, no boot — just the verdict)
scripts/pull.sh <org/Model> --profile-like vllm/minimal --dry-run

# If it passes: download + generate a minimal compose + boot it
scripts/pull.sh <org/Model> --profile-like vllm/minimal --yes

Any safetensors model, not just the catalog — if it passes the gates you get a working compose + a booted server, no manual compose authoring.
Answered before you download — the fit verdict comes from precise KV math on the repo's own config, so you don't pull 20–50 GB to find out it won't fit.
Honest, never silently wrong — every answer carries a confidence tier and a boot-fit ≠ runtime-stability caveat (a "fits" verdict is a boot-time check; validate with soak-continuous before trusting it for long agent workloads). Non-fits stop with a precise reason, not a cryptic crash.
It improves over time — failures are classified and deduped into a calibration backbone, so the math sharpens without waiting for a per-model release.

Full user guide → docs/PULL.md (Quickstart at the top).

Honest limits (read before relying on it)

safetensors + vLLM only. GGUF / .bin repos hard-block with a clear message (not a crash) — GGUF→llama.cpp is not served via pull (deferred).
Trust is maintainer-promoted this phase — consensus automation is later.
The curated path is unchanged — existing SINGLE_CARD / DUAL_CARD / catalog composes work exactly as before; pull is purely additive.

Commit-level detail below (auto-generated):

v0.8.0 — 2026-05-17

⚠️ Cliffs, gotchas, regressions

v0.8.0 Pull-Gate P4-fix: price Tier-1 curated via curated-exact kv-calc spec, not generic-dense (+ non-mocked regression test) (087a8ea)

🐛 Bug fixes

fix(verify-full): warm engine before scored checks (closes #352) (c595496)

📝 Documentation

docs(tq3-mtp): add missing 04-gemma-vs-qwen.png chart (da9ef5e)
docs: UPSTREAM Gemma4 TurboQuant row — exact config.py:101 mechanism + fix-PR set (fd8695f)
docs+hygiene: track Gemma4 native-TurboQuant upstream blocker; gitignore new MoE cache dirs (f812715)
docs(KERNEL_MATRIX): add Kernel Selection Philosophy section (287c766)

🧹 Other

Merge PR #147: v0.8.0 — Universal pull (evaluate & serve any safetensors HF model) (#147 by @noonghunna)
v0.8.0 [docs] PULL.md Quickstart (command-first, top-of-doc) + ARCHITECTURE one-liner: stage names are internal, users run one command (bef766d)
v0.8.0 [review] pre-tag fixes: scrub internal-path leaks from shipped source + make .pull-captures-corpus tests CI-safe (skip-when-absent) (49d9bb4)
v0.8.0 [docs] ARCHITECTURE.md: add the universal pull→gate→emit→loop pipeline to the mental model + scripts tree (current-state, was stale for v0.8.0) (1cdda19)
v0.8.0 [UX] §7 two doc tracks: docs/PULL.md (user front-door) + docs/README.md (track spine) + README migration nudge (a0b3b5c)
v0.8.0 [F] F8-fix: widen §6.1 Tier-1 OOM signature + pt3.actual regexes to real vLLM v0.21.0+ KV-cache-too-large phrasing — on-rig F8 caught classic-torch-only regexes miss the common KV-prediction failure (f92624d)
v0.8.0 [F] F7: docs/LOOP.md contributor doc (Loop phase, grounded in shipped F1–F6) + CONTRACT-5(i) risk note (a8b30d6)
v0.8.0 [F] F6: CONTRACT-5 mandatory content-hash kv_calc_version (G2) + G1 topo-verify + L2 fixture sync (1ac0481)
v0.8.0 [F] F5: §6.3 canonical-tuple-hash dedup + bounded label scheme + collision-safe submit path (CONTRACT-4) (5de7224)
v0.8.0 [F] F4: §6.2 inbound-trust pipeline raw→candidate→validated→Tier-1 + CONTRACT-3a derived-deferral (CONTRACT-3) (d758f08)
v0.8.0 [F] F3: G6-A 3-part additive [E] touch (pt1.predicted_b_breakdown, pt3.failure_log_excerpt+actual, container-log capture) + §6.1 Tier-1 (CONTRACT-2) (b100979)
v0.8.0 [F] F2: §6.1 Tier-2 semantic-fingerprint classifier + Appendix A seed DB (CONTRACT-2 Tier-2) (9f80d29)
v0.8.0 [F] F1: FInput capture-bundle reader + schema-1 validation + key-normalization (CONTRACT-1) (1491cbc)
v0.8.0 [E] E-outcome-fix: honest 3-state manifest outcome (partial-success != failed) — §6.2 partial is a capability-scoped success (71148d6)
v0.8.0 [E] E3/E4-fix: boot lifecycle as context manager (server stays up for smoke+capture, teardown on ctx-exit) — on-rig E5 caught teardown-in-finally-before-smoke (f7c405a)
v0.8.0 [E] E3-fix: smoke probes the real served-model-name (not literal "derived") + capture failure detail — on-rig E5 caught red-smoke-on-healthy-boot (16a1e4d)
v0.8.0 [E] E2-fix-2: verify *.safetensors against HF API lfs.sha256 (not Xet-redirect-fragile HEAD x-linked-etag) — on-rig E5 caught false no-etag (3ae74bf)
v0.8.0 [E] E2-fix: download via hf CLI subprocess (not huggingface_hub lib-import) — on-rig E5 caught ModuleNotFoundError (806a298)
v0.8.0 [E] E5(docs): docs/PULL_EMIT_DERIVED.md (+ private ledger/recon-checklist updates) (d134d5a)
v0.8.0 [E] E4: post-[C1] derived-[E] orchestration + trigger semantics + override force-capture (pt5) (2ed18aa)
v0.8.0 [E] E3: derived boot (HF_HOME mount) + 4 §6 capture emitters + manifest + derived smoke floor (f327887)
v0.8.0 [E] E2: HF download stage (download_set allowlist + x-linked-etag SHA, no-etag fail-closed, atomic staging) (7a2ec86)
v0.8.0 [E] E1: generate_from_profile + derived-vllm template + EInput + CONTRACT-5 gate (411c84f)
v0.8.0 Pull-Gate P5: docs/PULL_GATE.md (two-path model, 6-stratum taxonomy, §4.1 [C1], hardware-SM) (2582438)
v0.8.0 Pull-Gate P4: stratum-5 + [C1] §4.1 total fn + stratum-6 [D] dry-run + pull orchestrator + exhaustive test-pull.sh (adf7a3b)
v0.8.0 Pull-Gate P3: stratum-2 precondition + [C0] engine-support/runtime/hardware gate + [C2a] disk (4a1d385)
v0.8.0 Pull-Gate P2: transformers deriver + ModelProfile/confidence + variant-scoped hf_repos schema (818b79c)
v0.8.0 Pull-Gate P1: kv-calc generic-dense family + eligibility predicate + raw_verdict adapter (1bafcfe)
v0.8.0: doc generated composes are not relocatable (run with --project-directory) (a2fc05e)
v0.8.0 STEP 5: COMPOSE_GENERATOR.md + PATCH_POLICY.md (#141 contributor contract) (9546f99)
v0.8.0 STEP 3+4: compose generator + 5-triple golden-parity test (#141) (6d7a043)
v0.8.0 STEP 2: extract patch_attribution.py (sound body-only reaches(), test imports it) (60f3983)
v0.8.0 Phase A-prime: enrich patch/profile data for #141 generator (compose_service_template, genesis_equipped, delivery metadata, drift_guards, drafter/model_slug/trc fold-ins) (9f23736)
Add v0.8 Phase A patch attribution data (91a9622)

[Pin: git checkout v0.8.0] · Full diff

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track noonghunna/club-3090

Get notified when new releases ship.

About noonghunna/club-3090

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.7 Genesis vLLM composes deprecated; default to `vllm/minimal`.
v0.8.6 Compose paths moved to `models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml`.