Skip to content

This release includes breaking changes for platform teams planning a safe upgrade.

Published 19d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu
+7 more
huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal
editorial:auto 9d

Release v0.5.2 fixes generation inversion for `--profile vision`, corrects Apple Silicon partial‑offload speed to 0.85×, and resolves CI lint failures.

Why it matters: Corrected Apple Silicon performance estimate (0.85×) improves benchmark accuracy; resolved CI lint issues restores reliable pipeline status.

Summary

AI summary

Fixed generation inversion for --profile vision, corrected Apple Silicon partial-offload speed estimate, and resolved CI lint failures.

Changes in this release

Feature Medium

Round 3 regression suite added with 20 tests, each fails when revert fix.

Round 3 regression suite added with 20 tests, each fails when revert fix.

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

Benchmark snapshot date now displayed under every ranking.

Benchmark snapshot date now displayed under every ranking.

Source: llm_adapter@2026-05-21

Confidence: low

Performance Medium

Apple Silicon partial-offload speed corrected to 0.85x for unified memory.

Apple Silicon partial-offload speed corrected to 0.85x for unified memory.

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Correctness bugs from stress-testing unexercised axes are fixed.

Correctness bugs from stress-testing unexercised axes are fixed.

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Duplicate key in LiveBench fallback fixed; unformatted files reformatted.

Duplicate key in LiveBench fallback fixed; unformatted files reformatted.

Source: llm_adapter@2026-05-21

Confidence: high

Refactor Medium

CI lint pipeline status restored to green after fixes.

CI lint pipeline status restored to green after fixes.

Source: llm_adapter@2026-05-21

Confidence: low

Refactor Low

GitHub Actions runners updated to Node 24 and setup-python@v6; deprecated Node 20 actions removed.

GitHub Actions runners updated to Node 24 and setup-python@v6; deprecated Node 20 actions removed.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Full changelog

Hardening release: every Round 3 fix now has a regression test verified
to fail when reverted, the CI lint pipeline is green again (it was red
for the entire 0.5.1 release), and two correctness bugs found by
stress-testing previously unexercised axes are fixed.

Fixed

--profile vision generation inversion

Text leaderboards don't score VLMs, so the only model with a direct
benchmark hit was a two-generations-old Qwen2-VL-7B, which outranked
the current Qwen3-VL-32B even on an 80 GB H100. A curated
multimodal capability source (MMMU-Pro / MMBench, 2026-05) now scores
the Qwen3-VL / Qwen2.5-VL / Qwen2-VL / Llama-Vision / Phi-vision /
Gemma-3 / Pixtral / InternVL3 lines. Qwen3-VL-32B now leads vision at
73-76; the legacy 7B correctly drops to the low 30s.

Apple Silicon partial-offload speed (~3x under-estimate)

The flat 0.45x partial-offload penalty modelled a discrete GPU
spilling to CPU RAM across PCIe. Apple Silicon shares one unified-memory
pool, so spilled weights stay at full bandwidth. DeepSeek-R1-class
models on M2/M3 Ultra reported ~1.7 t/s when real-world is 4-15; now
0.85x for unified memory, 0.45x kept for discrete GPUs.

CI lint was red for all of 0.5.1

Qwen/Qwen3-Coder-30B-A3B-Instruct was a duplicate key in the
LiveBench fallback (silently scored 62 instead of 58) and 12 files were
unformatted — both broke the Lint job. Fixed; Lint + Tests are now
green on this release commit in actual GitHub CI.

Added

  • Round 3 regression suite (tests/test_r3_regressions.py, 20 tests).
    Every test was verified to go red when its fix is reverted — they
    pin real bugs, not the current implementation.
  • Benchmark snapshot date shown under every ranking, so a stale
    recommendation is self-evident instead of silently trusted.

CI

  • GitHub Actions runners updated to Node 24 (checkout@v5,
    setup-python@v6); Node 20 actions are deprecated from 2026-06.

Full changelog: CHANGELOG.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

Sign up free

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →

Beta — feedback welcome: [email protected]