Skip to content

This release adds 3 notable features for engineering teams evaluating rollout.

Published 17d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu
+7 more
huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal
editorial:auto 13d

v0.5.6 fixes memory calculation for Windows iGPUs and adds GPU detection for Ryzen AI and Radeon 890M systems. Speed estimates now include confidence metadata with tok/s ranges.

Why it matters: Windows users with Ryzen AI or Radeon 890M iGPUs should test in dev to confirm accurate GPU memory detection. Speed estimate confidence metadata enables precise tok/s planning for inference workloads.

Summary

AI summary

Added speed estimate confidence metadata, improved MoE estimates, and enhanced Windows GPU detection.

Changes in this release

Feature Medium

Add speed estimate confidence metadata and estimated tok/s ranges.

Add speed estimate confidence metadata and estimated tok/s ranges.

Source: granite4.1:8b-q6_K@2026-05-21

Confidence: high

Feature Medium

Improve MoE speed estimates using active parameters and bandwidth-scaled read floors.

Improve MoE speed estimates using active parameters and bandwidth-scaled read floors.

Source: granite4.1:8b-q6_K@2026-05-21

Confidence: high

Feature Medium

Add Windows AMD/Intel GPU detection fallback through `Win32_VideoController` and registry memory reads.

Add Windows AMD/Intel GPU detection fallback through `Win32_VideoController` and registry memory reads.

Source: granite4.1:8b-q6_K@2026-05-21

Confidence: high

Feature Medium

Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs.

Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs.

Source: granite4.1:8b-q6_K@2026-05-21

Confidence: high

Bugfix Medium

Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target.

Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target.

Source: granite4.1:8b-q6_K@2026-05-21

Confidence: high

Full changelog

What's Changed

  • Add speed estimate confidence metadata and estimated tok/s ranges.
  • Improve MoE speed estimates using active parameters and bandwidth-scaled read floors.
  • Add Windows AMD/Intel GPU detection fallback through Win32_VideoController and registry memory reads.
  • Treat Ryzen AI / Radeon 890M-class Windows iGPUs as shared-memory AMD GPUs.
  • Avoid summing dedicated GPU VRAM with shared-memory iGPU system RAM as one full-GPU target.

Validation

  • ruff format --check .
  • ruff check .
  • pytest -q -s
  • python -m build

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

Sign up free

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →

Beta — feedback welcome: [email protected]