This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+7 more
ReleasePort's take
Light signalVersion v0.5.1 adds Apple Silicon GPU simulation support and refreshes the frontier‑model list with new entries.
Why it matters: Test v0.5.1 in development to validate GPU simulation on M1‑M4 chips and assess performance of added models before production rollout.
Summary
AI summaryAdded Apple Silicon GPU simulation, refreshed frontier‑model list, and improved VRAM/speed estimation.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Added 30+ frontier models including Llama-4, DeepSeek-V4, reasoning models Added 30+ frontier models including Llama-4, DeepSeek-V4, reasoning models Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
whichllm upgrade compares GPU upgrades showing delta scores and verdicts whichllm upgrade compares GPU upgrades showing delta scores and verdicts Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
Apple Silicon M1-M4 chips supported in --gpu for stress-testing Apple Silicon M1-M4 chips supported in --gpu for stress-testing Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
Added 2026‑Q2 frontier models: Kimi‑K2, MiMo, DeepSeek‑V4, GLM‑5, Qwen3.6/Next, gpt‑oss, Llama‑4, Mistral Small/Large, Devstral, Codestral, MiniMax, Granite 3.3/4.0, Olmo‑3, Nemotron‑3, plus reasoning models (QwQ‑32B, Qwen3‑4B‑Thinking, DeepSeek‑R1 family, R1‑Distill). Added 2026‑Q2 frontier models: Kimi‑K2, MiMo, DeepSeek‑V4, GLM‑5, Qwen3.6/Next, gpt‑oss, Llama‑4, Mistral Small/Large, Devstral, Codestral, MiniMax, Granite 3.3/4.0, Olmo‑3, Nemotron‑3, plus reasoning models (QwQ‑32B, Qwen3‑4B‑Thinking, DeepSeek‑R1 family, R1‑Distill). Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
| Performance | Medium |
Added per‑backend speed multipliers (CUDA, Apple, AMD, Intel) and quantization efficiency factors for accurate performance estimates on different hardware. Added per‑backend speed multipliers (CUDA, Apple, AMD, Intel) and quantization efficiency factors for accurate performance estimates on different hardware. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
| Bugfix | Medium |
MoE models split correctly by active and total parameters MoE models split correctly by active and total parameters Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
Lineage-aware demotion prevents leaderboard bias against newer models Lineage-aware demotion prevents leaderboard bias against newer models Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
Per-backend speed multipliers and quant efficiency factors improve estimates Per-backend speed multipliers and quant efficiency factors improve estimates Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
Fixed family inheritance treating 6.6B fork as 158B base Fixed family inheritance treating 6.6B fork as 158B base Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
Family grouping prefers upstream model as base selection Family grouping prefers upstream model as base selection Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
httpx follow_redirects fixes HuggingFace case-mismatch URLs dropping IDs httpx follow_redirects fixes HuggingFace case-mismatch URLs dropping IDs Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
Quality and speed floors eliminate junk Q1_0 candidates Quality and speed floors eliminate junk Q1_0 candidates Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
KV cache scaling tuned for real 128K-context runs KV cache scaling tuned for real 128K-context runs Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
Removed 11 non-existent HuggingFace IDs from benchmark fallbacks Removed 11 non-existent HuggingFace IDs from benchmark fallbacks Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
`httpx` configured with `follow_redirects=True` so case‑mismatch HuggingFace URLs (307 redirects) no longer silently drop frontier IDs. `httpx` configured with `follow_redirects=True` so case‑mismatch HuggingFace URLs (307 redirects) no longer silently drop frontier IDs. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
| Bugfix | Low |
Family inheritance no longer treats a 6.6 B "imatrix‑aligned" / MTP‑head fork as the same model as its 158 B base. Family inheritance no longer treats a 6.6 B "imatrix‑aligned" / MTP‑head fork as the same model as its 158 B base. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
| Bugfix | Low |
Family grouping now prefers the upstream (original) model as the base rather than the most‑downloaded fork. Family grouping now prefers the upstream (original) model as the base rather than the most‑downloaded fork. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
| Bugfix | Low |
Quality floor (≥ 20) and speed floor (≥ 1.5 tokens/s) filter out low‑quality Q1_0 / Bonsai‑class candidates from recommendations. Quality floor (≥ 20) and speed floor (≥ 1.5 tokens/s) filter out low‑quality Q1_0 / Bonsai‑class candidates from recommendations. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
Full changelog
What's New
whichllm upgrade — Compare GPU upgrades side-by-side
whichllm upgrade --target "RTX 4090"
Shows the current machine and a target GPU together with delta scores
and a verdict (worth it / meaningful / marginal / flat / downgrade).
Apple Silicon support in --gpu
whichllm --gpu "M3 Max" --vram 64
whichllm --gpu "M2 Ultra" --vram 192
Simulator now understands every M1-M4 chip (base / Pro / Max / Ultra),
so Mac users can stress-test rankings without owning the hardware. No
more spurious "ROCm requires Linux" warnings on simulated Apple boxes.
Frontier-model coverage refresh
2026-Q2 releases that did not previously surface are now included:
Kimi-K2, MiMo, DeepSeek-V4, GLM-5, Qwen3.6 / Qwen3-Next, gpt-oss,
Llama-4, Mistral Small/Large, Devstral, Codestral, MiniMax,
Granite 3.3/4.0, Olmo-3, Nemotron-3, plus the reasoning lines
QwQ-32B, Qwen3-4B-Thinking, DeepSeek-R1 and the R1-Distill family.
Smarter VRAM / speed estimates
- KV cache scaling tuned to match real 128K-context runs.
- MoE models split correctly: total params drive VRAM and knowledge,
active params drive speed. - Per-backend speed multipliers (CUDA / Apple / AMD / Intel) and
per-quant efficiency factors so Apple Silicon and partial-offload
numbers stop overshooting. - Lineage-aware demotion stops 2024-era leaderboards (OLLB v2, Arena
ELO) from over-rewarding older generations against their newer
siblings.
Bug fixes
- Family inheritance no longer treats a 6.6B "imatrix-aligned" /
MTP-head fork as the same model as its 158B base. - Family grouping prefers the upstream model as the base, not whichever
fork has the most downloads. - httpx
follow_redirects=Trueso case-mismatch HuggingFace URLs (307)
no longer drop frontier IDs silently. - Quality floor (≥ 20) and speed floor (≥ 1.5 t/s) drop junk Q1_0 /
Bonsai-class candidates that previously slipped into low-VRAM
recommendations. - Removed 11 non-existent HF IDs from curated benchmark fallbacks.
Full changelog: CHANGELOG.md
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Track Find the best local LLM for your hardware, ranked by benchmarks
Get notified when new releases ship.
Sign up freeAbout Find the best local LLM for your hardware, ranked by benchmarks
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]