This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+7 more
ReleasePort's take
Light signalRelease v0.5.2 fixes generation inversion for `--profile vision`, corrects Apple Silicon partial‑offload speed to 0.85×, and resolves CI lint failures.
Why it matters: Corrected Apple Silicon performance estimate (0.85×) improves benchmark accuracy; resolved CI lint issues restores reliable pipeline status.
Summary
AI summaryFixed generation inversion for --profile vision, corrected Apple Silicon partial-offload speed estimate, and resolved CI lint failures.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Round 3 regression suite added with 20 tests, each fails when revert fix. Round 3 regression suite added with 20 tests, each fails when revert fix. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
Benchmark snapshot date now displayed under every ranking. Benchmark snapshot date now displayed under every ranking. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Performance | Medium |
Apple Silicon partial-offload speed corrected to 0.85x for unified memory. Apple Silicon partial-offload speed corrected to 0.85x for unified memory. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
Correctness bugs from stress-testing unexercised axes are fixed. Correctness bugs from stress-testing unexercised axes are fixed. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
Duplicate key in LiveBench fallback fixed; unformatted files reformatted. Duplicate key in LiveBench fallback fixed; unformatted files reformatted. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Refactor | Medium |
CI lint pipeline status restored to green after fixes. CI lint pipeline status restored to green after fixes. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Refactor | Low |
GitHub Actions runners updated to Node 24 and setup-python@v6; deprecated Node 20 actions removed. GitHub Actions runners updated to Node 24 and setup-python@v6; deprecated Node 20 actions removed. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
Full changelog
Hardening release: every Round 3 fix now has a regression test verified
to fail when reverted, the CI lint pipeline is green again (it was red
for the entire 0.5.1 release), and two correctness bugs found by
stress-testing previously unexercised axes are fixed.
Fixed
--profile vision generation inversion
Text leaderboards don't score VLMs, so the only model with a direct
benchmark hit was a two-generations-old Qwen2-VL-7B, which outranked
the current Qwen3-VL-32B even on an 80 GB H100. A curated
multimodal capability source (MMMU-Pro / MMBench, 2026-05) now scores
the Qwen3-VL / Qwen2.5-VL / Qwen2-VL / Llama-Vision / Phi-vision /
Gemma-3 / Pixtral / InternVL3 lines. Qwen3-VL-32B now leads vision at
73-76; the legacy 7B correctly drops to the low 30s.
Apple Silicon partial-offload speed (~3x under-estimate)
The flat 0.45x partial-offload penalty modelled a discrete GPU
spilling to CPU RAM across PCIe. Apple Silicon shares one unified-memory
pool, so spilled weights stay at full bandwidth. DeepSeek-R1-class
models on M2/M3 Ultra reported ~1.7 t/s when real-world is 4-15; now
0.85x for unified memory, 0.45x kept for discrete GPUs.
CI lint was red for all of 0.5.1
Qwen/Qwen3-Coder-30B-A3B-Instruct was a duplicate key in the
LiveBench fallback (silently scored 62 instead of 58) and 12 files were
unformatted — both broke the Lint job. Fixed; Lint + Tests are now
green on this release commit in actual GitHub CI.
Added
- Round 3 regression suite (
tests/test_r3_regressions.py, 20 tests).
Every test was verified to go red when its fix is reverted — they
pin real bugs, not the current implementation. - Benchmark snapshot date shown under every ranking, so a stale
recommendation is self-evident instead of silently trusted.
CI
- GitHub Actions runners updated to Node 24 (
checkout@v5,
setup-python@v6); Node 20 actions are deprecated from 2026-06.
Full changelog: CHANGELOG.md
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Track Find the best local LLM for your hardware, ranked by benchmarks
Get notified when new releases ship.
Sign up freeAbout Find the best local LLM for your hardware, ranked by benchmarks
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]