Find the best local LLM for your hardware, ranked by benchmarks

v0.5.2 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu

+7 more

huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal

editorial:auto 2mo

Release v0.5.2 fixes generation inversion for `--profile vision`, corrects Apple Silicon partial‑offload speed to 0.85×, and resolves CI lint failures.

Why it matters: Corrected Apple Silicon performance estimate (0.85×) improves benchmark accuracy; resolved CI lint issues restores reliable pipeline status.

Summary

AI summary

Fixed generation inversion for --profile vision, corrected Apple Silicon partial-offload speed estimate, and resolved CI lint failures.

Changes in this release

Type	Severity	Summary	CVE
Feature	Medium	Round 3 regression suite added with 20 tests, each fails when revert fix. Round 3 regression suite added with 20 tests, each fails when revert fix. Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	Benchmark snapshot date now displayed under every ranking. Benchmark snapshot date now displayed under every ranking. Source: llm_adapter@2026-05-21 Confidence: low	—
Performance	Medium	Apple Silicon partial-offload speed corrected to 0.85x for unified memory. Apple Silicon partial-offload speed corrected to 0.85x for unified memory. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	Correctness bugs from stress-testing unexercised axes are fixed. Correctness bugs from stress-testing unexercised axes are fixed. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	Duplicate key in LiveBench fallback fixed; unformatted files reformatted. Duplicate key in LiveBench fallback fixed; unformatted files reformatted. Source: llm_adapter@2026-05-21 Confidence: high	—
Refactor	Medium	CI lint pipeline status restored to green after fixes. CI lint pipeline status restored to green after fixes. Source: llm_adapter@2026-05-21 Confidence: low	—
Refactor	Low	GitHub Actions runners updated to Node 24 and setup-python@v6; deprecated Node 20 actions removed. GitHub Actions runners updated to Node 24 and setup-python@v6; deprecated Node 20 actions removed. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—

Full changelog

Hardening release: every Round 3 fix now has a regression test verified
to fail when reverted, the CI lint pipeline is green again (it was red
for the entire 0.5.1 release), and two correctness bugs found by
stress-testing previously unexercised axes are fixed.

Fixed

`--profile vision` generation inversion

Text leaderboards don't score VLMs, so the only model with a direct
benchmark hit was a two-generations-old Qwen2-VL-7B, which outranked
the current Qwen3-VL-32B even on an 80 GB H100. A curated
multimodal capability source (MMMU-Pro / MMBench, 2026-05) now scores
the Qwen3-VL / Qwen2.5-VL / Qwen2-VL / Llama-Vision / Phi-vision /
Gemma-3 / Pixtral / InternVL3 lines. Qwen3-VL-32B now leads vision at
73-76; the legacy 7B correctly drops to the low 30s.

Apple Silicon partial-offload speed (~3x under-estimate)

The flat 0.45x partial-offload penalty modelled a discrete GPU
spilling to CPU RAM across PCIe. Apple Silicon shares one unified-memory
pool, so spilled weights stay at full bandwidth. DeepSeek-R1-class
models on M2/M3 Ultra reported ~1.7 t/s when real-world is 4-15; now
0.85x for unified memory, 0.45x kept for discrete GPUs.

CI lint was red for all of 0.5.1

Qwen/Qwen3-Coder-30B-A3B-Instruct was a duplicate key in the
LiveBench fallback (silently scored 62 instead of 58) and 12 files were
unformatted — both broke the Lint job. Fixed; Lint + Tests are now
green on this release commit in actual GitHub CI.

Added

Round 3 regression suite (tests/test_r3_regressions.py, 20 tests).
Every test was verified to go red when its fix is reverted — they
pin real bugs, not the current implementation.
Benchmark snapshot date shown under every ranking, so a stale
recommendation is self-evident instead of silently trusted.

CI

GitHub Actions runners updated to Node 24 (checkout@v5,
setup-python@v6); Node 20 actions are deprecated from 2026-06.

Full changelog: CHANGELOG.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →

Find the best local LLM for your hardware, ranked by benchmarks

ReleasePort's take

Summary

Changes in this release

Fixed

`--profile vision` generation inversion

Apple Silicon partial-offload speed (~3x under-estimate)

CI lint was red for all of 0.5.1

Added

CI

Related context

Related tools

Find the best local LLM for your hardware, ranked by benchmarks

ReleasePort's take

Summary

Changes in this release

Fixed

--profile vision generation inversion

Apple Silicon partial-offload speed (~3x under-estimate)

CI lint was red for all of 0.5.1

Added

CI

Related context

Related tools

`--profile vision` generation inversion