Find the best local LLM for your hardware, ranked by benchmarks

v0.5.3 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu

+7 more

huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal

editorial:auto 2mo

Release v0.5.3 fixes a KeyError crash in `whichllm run` transformers chat and updates GPU detection/fallback features.

Why it matters: Addresses a critical bug that caused crashes when invoking the transformers chat path; ensures reliable operation for developers using whichllm on Linux Intel, NVIDIA, or Apple Silicon GPUs.

Summary

AI summary

Fixed transformers chat crash by passing tokenizer mappings to model.generate, preventing KeyError: 'shape'.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Linux Intel integrated GPU detection via /sys/class/drm. Linux Intel integrated GPU detection via /sys/class/drm. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	NVIDIA nvidia-smi fallback detection when pynvml missing or NVML fails. NVIDIA nvidia-smi fallback detection when pynvml missing or NVML fails. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Apple-prefixed Apple Silicon simulator aliases support. Apple-prefixed Apple Silicon simulator aliases support. Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	Fixed `whichllm run` transformers chat path to avoid KeyError: 'shape'. Fixed `whichllm run` transformers chat path to avoid KeyError: 'shape'. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	RTX 5060 Ti bandwidth now reports 448 GB/s. RTX 5060 Ti bandwidth now reports 448 GB/s. Source: llm_adapter@2026-05-21 Confidence: low	—

Full changelog

What's Changed

Added

Linux Intel integrated GPU detection via /sys/class/drm, so Intel iGPU systems are no longer treated as CPU-only by default.
NVIDIA nvidia-smi fallback detection when pynvml is missing, NVML init fails, or NVML reports no devices.
Apple-prefixed Apple Silicon simulator aliases, so --gpu "Apple M3 Max" works like --gpu "M3 Max".

Fixed

Fixed the whichllm run transformers chat path by passing tokenizer mappings into model.generate(**inputs), avoiding the KeyError: 'shape' crash.
RTX 5060 Ti bandwidth lookup now reports 448 GB/s instead of N/A.

Docs and maintenance

Updated install guidance toward uvx / uv tool install.
Removed the old marketing note and added sponsor metadata.

Verification

uv run pytest — 138 passed
uv run --with ruff ruff check . — passed
uv run --with ruff ruff format --check . — passed
uv run whichllm --version — 0.5.3
uv run --with build python -m build — built wheel and sdist

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →