Find the best local LLM for your hardware, ranked by benchmarks

v0.5.1 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu

+7 more

huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal

editorial:auto 2mo

Version v0.5.1 adds Apple Silicon GPU simulation support and refreshes the frontier‑model list with new entries.

Why it matters: Test v0.5.1 in development to validate GPU simulation on M1‑M4 chips and assess performance of added models before production rollout.

Summary

AI summary

Added Apple Silicon GPU simulation, refreshed frontier‑model list, and improved VRAM/speed estimation.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Added 30+ frontier models including Llama-4, DeepSeek-V4, reasoning models Added 30+ frontier models including Llama-4, DeepSeek-V4, reasoning models Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	whichllm upgrade compares GPU upgrades showing delta scores and verdicts whichllm upgrade compares GPU upgrades showing delta scores and verdicts Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	Apple Silicon M1-M4 chips supported in --gpu for stress-testing Apple Silicon M1-M4 chips supported in --gpu for stress-testing Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	Added 2026‑Q2 frontier models: Kimi‑K2, MiMo, DeepSeek‑V4, GLM‑5, Qwen3.6/Next, gpt‑oss, Llama‑4, Mistral Small/Large, Devstral, Codestral, MiniMax, Granite 3.3/4.0, Olmo‑3, Nemotron‑3, plus reasoning models (QwQ‑32B, Qwen3‑4B‑Thinking, DeepSeek‑R1 family, R1‑Distill). Added 2026‑Q2 frontier models: Kimi‑K2, MiMo, DeepSeek‑V4, GLM‑5, Qwen3.6/Next, gpt‑oss, Llama‑4, Mistral Small/Large, Devstral, Codestral, MiniMax, Granite 3.3/4.0, Olmo‑3, Nemotron‑3, plus reasoning models (QwQ‑32B, Qwen3‑4B‑Thinking, DeepSeek‑R1 family, R1‑Distill). Source: granite4.1:30b@2026-05-22-audit Confidence: low	—
Performance	Medium	Added per‑backend speed multipliers (CUDA, Apple, AMD, Intel) and quantization efficiency factors for accurate performance estimates on different hardware. Added per‑backend speed multipliers (CUDA, Apple, AMD, Intel) and quantization efficiency factors for accurate performance estimates on different hardware. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—
Bugfix
Bugfix	Medium	MoE models split correctly by active and total parameters MoE models split correctly by active and total parameters Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	Lineage-aware demotion prevents leaderboard bias against newer models Lineage-aware demotion prevents leaderboard bias against newer models Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	Per-backend speed multipliers and quant efficiency factors improve estimates Per-backend speed multipliers and quant efficiency factors improve estimates Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	Fixed family inheritance treating 6.6B fork as 158B base Fixed family inheritance treating 6.6B fork as 158B base Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	Family grouping prefers upstream model as base selection Family grouping prefers upstream model as base selection Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	httpx follow_redirects fixes HuggingFace case-mismatch URLs dropping IDs httpx follow_redirects fixes HuggingFace case-mismatch URLs dropping IDs Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	Quality and speed floors eliminate junk Q1_0 candidates Quality and speed floors eliminate junk Q1_0 candidates Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	KV cache scaling tuned for real 128K-context runs KV cache scaling tuned for real 128K-context runs Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	Removed 11 non-existent HuggingFace IDs from benchmark fallbacks Removed 11 non-existent HuggingFace IDs from benchmark fallbacks Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	`httpx` configured with `follow_redirects=True` so case‑mismatch HuggingFace URLs (307 redirects) no longer silently drop frontier IDs. `httpx` configured with `follow_redirects=True` so case‑mismatch HuggingFace URLs (307 redirects) no longer silently drop frontier IDs. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—
Bugfix	Low	Family inheritance no longer treats a 6.6 B "imatrix‑aligned" / MTP‑head fork as the same model as its 158 B base. Family inheritance no longer treats a 6.6 B "imatrix‑aligned" / MTP‑head fork as the same model as its 158 B base. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—
Bugfix	Low	Family grouping now prefers the upstream (original) model as the base rather than the most‑downloaded fork. Family grouping now prefers the upstream (original) model as the base rather than the most‑downloaded fork. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—
Bugfix	Low	Quality floor (≥ 20) and speed floor (≥ 1.5 tokens/s) filter out low‑quality Q1_0 / Bonsai‑class candidates from recommendations. Quality floor (≥ 20) and speed floor (≥ 1.5 tokens/s) filter out low‑quality Q1_0 / Bonsai‑class candidates from recommendations. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—

Full changelog

What's New

`whichllm upgrade` — Compare GPU upgrades side-by-side

whichllm upgrade --target "RTX 4090"

Shows the current machine and a target GPU together with delta scores
and a verdict (worth it / meaningful / marginal / flat / downgrade).

Apple Silicon support in `--gpu`

whichllm --gpu "M3 Max" --vram 64
whichllm --gpu "M2 Ultra" --vram 192

Simulator now understands every M1-M4 chip (base / Pro / Max / Ultra),
so Mac users can stress-test rankings without owning the hardware. No
more spurious "ROCm requires Linux" warnings on simulated Apple boxes.

Frontier-model coverage refresh

2026-Q2 releases that did not previously surface are now included:
Kimi-K2, MiMo, DeepSeek-V4, GLM-5, Qwen3.6 / Qwen3-Next, gpt-oss,
Llama-4, Mistral Small/Large, Devstral, Codestral, MiniMax,
Granite 3.3/4.0, Olmo-3, Nemotron-3, plus the reasoning lines
QwQ-32B, Qwen3-4B-Thinking, DeepSeek-R1 and the R1-Distill family.

Smarter VRAM / speed estimates

KV cache scaling tuned to match real 128K-context runs.
MoE models split correctly: total params drive VRAM and knowledge,
active params drive speed.
Per-backend speed multipliers (CUDA / Apple / AMD / Intel) and
per-quant efficiency factors so Apple Silicon and partial-offload
numbers stop overshooting.
Lineage-aware demotion stops 2024-era leaderboards (OLLB v2, Arena
ELO) from over-rewarding older generations against their newer
siblings.

Bug fixes

Family inheritance no longer treats a 6.6B "imatrix-aligned" /
MTP-head fork as the same model as its 158B base.
Family grouping prefers the upstream model as the base, not whichever
fork has the most downloads.
httpx follow_redirects=True so case-mismatch HuggingFace URLs (307)
no longer drop frontier IDs silently.
Quality floor (≥ 20) and speed floor (≥ 1.5 t/s) drop junk Q1_0 /
Bonsai-class candidates that previously slipped into low-VRAM
recommendations.
Removed 11 non-existent HF IDs from curated benchmark fallbacks.

Full changelog: CHANGELOG.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →

Find the best local LLM for your hardware, ranked by benchmarks

ReleasePort's take

Summary

Changes in this release

What's New

`whichllm upgrade` — Compare GPU upgrades side-by-side

Apple Silicon support in `--gpu`

Frontier-model coverage refresh

Smarter VRAM / speed estimates

Bug fixes

Related context

Related tools

Find the best local LLM for your hardware, ranked by benchmarks

ReleasePort's take

Summary

Changes in this release

What's New

whichllm upgrade — Compare GPU upgrades side-by-side

Apple Silicon support in --gpu

Frontier-model coverage refresh

Smarter VRAM / speed estimates

Bug fixes

Related context

Related tools

`whichllm upgrade` — Compare GPU upgrades side-by-side

Apple Silicon support in `--gpu`