Find the best local LLM for your hardware, ranked by benchmarks

v0.5.6 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu

+7 more

huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal

editorial:auto 2mo

v0.5.6 fixes memory calculation for Windows iGPUs and adds GPU detection for Ryzen AI and Radeon 890M systems. Speed estimates now include confidence metadata with tok/s ranges.

Why it matters: Windows users with Ryzen AI or Radeon 890M iGPUs should test in dev to confirm accurate GPU memory detection. Speed estimate confidence metadata enables precise tok/s planning for inference workloads.

Summary

AI summary

Added speed estimate confidence metadata, improved MoE estimates, and enhanced Windows GPU detection.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Add speed estimate confidence metadata and estimated tok/s ranges. Add speed estimate confidence metadata and estimated tok/s ranges. Source: granite4.1:8b-q6_K@2026-05-21 Confidence: high	—
Feature	Medium	Improve MoE speed estimates using active parameters and bandwidth-scaled read floors. Improve MoE speed estimates using active parameters and bandwidth-scaled read floors. Source: granite4.1:8b-q6_K@2026-05-21 Confidence: high	—
Feature	Medium	Add Windows AMD/Intel GPU detection fallback through `Win32_VideoController` and registry memory reads. Add Windows AMD/Intel GPU detection fallback through `Win32_VideoController` and registry memory reads. Source: granite4.1:8b-q6_K@2026-05-21 Confidence: high	—
Feature	Medium	Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs. Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs. Source: granite4.1:8b-q6_K@2026-05-21 Confidence: high	—
Bugfix	Medium	Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target. Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target. Source: granite4.1:8b-q6_K@2026-05-21 Confidence: high	—

Full changelog

What's Changed

Add speed estimate confidence metadata and estimated tok/s ranges.
Improve MoE speed estimates using active parameters and bandwidth-scaled read floors.
Add Windows AMD/Intel GPU detection fallback through Win32_VideoController and registry memory reads.
Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs.
Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target.

Validation

ruff format --check .
ruff check .
pytest -q -s
python -m build

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →