Skip to content

This release includes breaking changes for platform teams planning a safe upgrade.

Published 21d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu
+7 more
huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal
editorial:auto 13d

Version v0.5.1 adds Apple Silicon GPU simulation support and refreshes the frontier‑model list with new entries.

Why it matters: Test v0.5.1 in development to validate GPU simulation on M1‑M4 chips and assess performance of added models before production rollout.

Summary

AI summary

Added Apple Silicon GPU simulation, refreshed frontier‑model list, and improved VRAM/speed estimation.

Changes in this release

Feature Medium

Added 30+ frontier models including Llama-4, DeepSeek-V4, reasoning models

Added 30+ frontier models including Llama-4, DeepSeek-V4, reasoning models

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

whichllm upgrade compares GPU upgrades showing delta scores and verdicts

whichllm upgrade compares GPU upgrades showing delta scores and verdicts

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

Apple Silicon M1-M4 chips supported in --gpu for stress-testing

Apple Silicon M1-M4 chips supported in --gpu for stress-testing

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

Added 2026‑Q2 frontier models: Kimi‑K2, MiMo, DeepSeek‑V4, GLM‑5, Qwen3.6/Next, gpt‑oss, Llama‑4, Mistral Small/Large, Devstral, Codestral, MiniMax, Granite 3.3/4.0, Olmo‑3, Nemotron‑3, plus reasoning models (QwQ‑32B, Qwen3‑4B‑Thinking, DeepSeek‑R1 family, R1‑Distill).

Added 2026‑Q2 frontier models: Kimi‑K2, MiMo, DeepSeek‑V4, GLM‑5, Qwen3.6/Next, gpt‑oss, Llama‑4, Mistral Small/Large, Devstral, Codestral, MiniMax, Granite 3.3/4.0, Olmo‑3, Nemotron‑3, plus reasoning models (QwQ‑32B, Qwen3‑4B‑Thinking, DeepSeek‑R1 family, R1‑Distill).

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Performance Medium

Added per‑backend speed multipliers (CUDA, Apple, AMD, Intel) and quantization efficiency factors for accurate performance estimates on different hardware.

Added per‑backend speed multipliers (CUDA, Apple, AMD, Intel) and quantization efficiency factors for accurate performance estimates on different hardware.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Bugfix Medium

MoE models split correctly by active and total parameters

MoE models split correctly by active and total parameters

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Lineage-aware demotion prevents leaderboard bias against newer models

Lineage-aware demotion prevents leaderboard bias against newer models

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Per-backend speed multipliers and quant efficiency factors improve estimates

Per-backend speed multipliers and quant efficiency factors improve estimates

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

Fixed family inheritance treating 6.6B fork as 158B base

Fixed family inheritance treating 6.6B fork as 158B base

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

Family grouping prefers upstream model as base selection

Family grouping prefers upstream model as base selection

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

httpx follow_redirects fixes HuggingFace case-mismatch URLs dropping IDs

httpx follow_redirects fixes HuggingFace case-mismatch URLs dropping IDs

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

Quality and speed floors eliminate junk Q1_0 candidates

Quality and speed floors eliminate junk Q1_0 candidates

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

KV cache scaling tuned for real 128K-context runs

KV cache scaling tuned for real 128K-context runs

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

Removed 11 non-existent HuggingFace IDs from benchmark fallbacks

Removed 11 non-existent HuggingFace IDs from benchmark fallbacks

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

`httpx` configured with `follow_redirects=True` so case‑mismatch HuggingFace URLs (307 redirects) no longer silently drop frontier IDs.

`httpx` configured with `follow_redirects=True` so case‑mismatch HuggingFace URLs (307 redirects) no longer silently drop frontier IDs.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Bugfix Low

Family inheritance no longer treats a 6.6 B "imatrix‑aligned" / MTP‑head fork as the same model as its 158 B base.

Family inheritance no longer treats a 6.6 B "imatrix‑aligned" / MTP‑head fork as the same model as its 158 B base.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Bugfix Low

Family grouping now prefers the upstream (original) model as the base rather than the most‑downloaded fork.

Family grouping now prefers the upstream (original) model as the base rather than the most‑downloaded fork.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Bugfix Low

Quality floor (≥ 20) and speed floor (≥ 1.5 tokens/s) filter out low‑quality Q1_0 / Bonsai‑class candidates from recommendations.

Quality floor (≥ 20) and speed floor (≥ 1.5 tokens/s) filter out low‑quality Q1_0 / Bonsai‑class candidates from recommendations.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Full changelog

What's New

whichllm upgrade — Compare GPU upgrades side-by-side

whichllm upgrade --target "RTX 4090"

Shows the current machine and a target GPU together with delta scores
and a verdict (worth it / meaningful / marginal / flat / downgrade).

Apple Silicon support in --gpu

whichllm --gpu "M3 Max" --vram 64
whichllm --gpu "M2 Ultra" --vram 192

Simulator now understands every M1-M4 chip (base / Pro / Max / Ultra),
so Mac users can stress-test rankings without owning the hardware. No
more spurious "ROCm requires Linux" warnings on simulated Apple boxes.

Frontier-model coverage refresh

2026-Q2 releases that did not previously surface are now included:
Kimi-K2, MiMo, DeepSeek-V4, GLM-5, Qwen3.6 / Qwen3-Next, gpt-oss,
Llama-4, Mistral Small/Large, Devstral, Codestral, MiniMax,
Granite 3.3/4.0, Olmo-3, Nemotron-3, plus the reasoning lines
QwQ-32B, Qwen3-4B-Thinking, DeepSeek-R1 and the R1-Distill family.

Smarter VRAM / speed estimates

  • KV cache scaling tuned to match real 128K-context runs.
  • MoE models split correctly: total params drive VRAM and knowledge,
    active params drive speed.
  • Per-backend speed multipliers (CUDA / Apple / AMD / Intel) and
    per-quant efficiency factors so Apple Silicon and partial-offload
    numbers stop overshooting.
  • Lineage-aware demotion stops 2024-era leaderboards (OLLB v2, Arena
    ELO) from over-rewarding older generations against their newer
    siblings.

Bug fixes

  • Family inheritance no longer treats a 6.6B "imatrix-aligned" /
    MTP-head fork as the same model as its 158B base.
  • Family grouping prefers the upstream model as the base, not whichever
    fork has the most downloads.
  • httpx follow_redirects=True so case-mismatch HuggingFace URLs (307)
    no longer drop frontier IDs silently.
  • Quality floor (≥ 20) and speed floor (≥ 1.5 t/s) drop junk Q1_0 /
    Bonsai-class candidates that previously slipped into low-VRAM
    recommendations.
  • Removed 11 non-existent HF IDs from curated benchmark fallbacks.

Full changelog: CHANGELOG.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

Sign up free

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →

Beta — feedback welcome: [email protected]