Skip to content

This release includes breaking changes for platform teams planning a safe upgrade.

Published 18d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon benchmarks cli gguf gpu
+7 more
huggingface inference llm local-llm ollama python vram

ReleasePort's take

Light signal
editorial:auto 9d

Release v0.5.3 fixes a KeyError crash in `whichllm run` transformers chat and updates GPU detection/fallback features.

Why it matters: Addresses a critical bug that caused crashes when invoking the transformers chat path; ensures reliable operation for developers using whichllm on Linux Intel, NVIDIA, or Apple Silicon GPUs.

Summary

AI summary

Fixed transformers chat crash by passing tokenizer mappings to model.generate, preventing KeyError: 'shape'.

Changes in this release

Feature Medium

Linux Intel integrated GPU detection via /sys/class/drm.

Linux Intel integrated GPU detection via /sys/class/drm.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

NVIDIA nvidia-smi fallback detection when pynvml missing or NVML fails.

NVIDIA nvidia-smi fallback detection when pynvml missing or NVML fails.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Apple-prefixed Apple Silicon simulator aliases support.

Apple-prefixed Apple Silicon simulator aliases support.

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

Fixed `whichllm run` transformers chat path to avoid KeyError: 'shape'.

Fixed `whichllm run` transformers chat path to avoid KeyError: 'shape'.

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

RTX 5060 Ti bandwidth now reports 448 GB/s.

RTX 5060 Ti bandwidth now reports 448 GB/s.

Source: llm_adapter@2026-05-21

Confidence: low

Full changelog

What's Changed

Added

  • Linux Intel integrated GPU detection via /sys/class/drm, so Intel iGPU systems are no longer treated as CPU-only by default.
  • NVIDIA nvidia-smi fallback detection when pynvml is missing, NVML init fails, or NVML reports no devices.
  • Apple-prefixed Apple Silicon simulator aliases, so --gpu "Apple M3 Max" works like --gpu "M3 Max".

Fixed

  • Fixed the whichllm run transformers chat path by passing tokenizer mappings into model.generate(**inputs), avoiding the KeyError: 'shape' crash.
  • RTX 5060 Ti bandwidth lookup now reports 448 GB/s instead of N/A.

Docs and maintenance

  • Updated install guidance toward uvx / uv tool install.
  • Removed the old marketing note and added sponsor metadata.

Verification

  • uv run pytest — 138 passed
  • uv run --with ruff ruff check . — passed
  • uv run --with ruff ruff format --check . — passed
  • uv run whichllm --version — 0.5.3
  • uv run --with build python -m build — built wheel and sdist

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Find the best local LLM for your hardware, ranked by benchmarks

Get notified when new releases ship.

Sign up free

About Find the best local LLM for your hardware, ranked by benchmarks

All releases →

Beta — feedback welcome: [email protected]