LiteRT-LM releases - releaseport

No immediate action

v0.14.0 Maintenance 24d

Routine maintenance and dependency updates.

Open

No immediate action

v0.13.1 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

v0.13.0 New feature 1mo

Agent skill + CLI + macOS Swift

Open

No immediate action

v0.12.0 New feature 2mo

Swift, Web JS, CLI NPU, Flutter

Open

v0.11.0 New feature 2mo

Notable features

Windows Native Support: LiteRT-LM CLI runs natively on Windows with CPU and GPU backends

Full changelog

🔥 What's New: `v0.11.0`

Gemma 4 Multi-token Prediction (MTP) Support: Supercharge Gemma 4 on-device inference with Single Position Multi Token Prediction (MTP), delivering >2x faster decode speeds on mobile GPUs with zero quality degradation (blog, documentation).
Windows Native Support: The LiteRT-LM CLI now runs natively on Windows with both CPU and GPU backend support.

View release on GitHub

v0.10.2 Feature 3mo

Notable features

Improved UI smoothness

Changelog

Various Bug fixes
Improve the UI smoothness

View release on GitHub

v0.10.1 New feature 3mo

Notable features

Support for deploying and running Gemma 4 across Linux, macOS, Windows (WSL) and Raspberry Pi
Migrated CLI from `fire` to `click`, adding `--verbose`, `--version`, improved help formatting and styled terminal output
Added direct Hugging Face model import with auto‑conversion for missing models during `run`

Full changelog

🔥 Gemma 4 support

Deploy Gemma 4 across a broad range of hardware with stellar performance (blog).

👉 Try on Linux, macOS, Windows (WSL) or Raspberry Pi with the
LiteRT-LM CLI:

litert-lm run  \
   --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
   gemma-4-E2B-it.litertlm \
   --prompt="What is the capital of France?"

Release Notes

CLI Enhancements & Migration: Migrated the CLI from fire to click, adding features like --verbose, --version, improved help formatting, and enhanced terminal output styling (#1784, #1733, #1791, #1792).
Hugging Face Integration: Added support for importing models directly from Hugging Face and implemented auto-conversion for missing models during "run" commands (#1797, #1735).
Core Performance & Features: Introduced a LiteRT-based KV cache implementation, speculative decoding support, and improved context merging for conversation history (#1601, #1793, #1742).
Platform & Build Improvements: Refactored CMake for better Android/cross-compilation support, updated the Windows build with a CPU sampler workaround, and transitioned nightly releases to Ubuntu-22.04 (#1741, #1734, #1772).
API & Documentation: Expanded the Kotlin API for response channel configuration and launched new Python API resources, including a "Getting Started" guide and a Colab notebook (#1724, #1737, #1757).

View release on GitHub

v0.9.0 Bugfix 4mo

General stability enhancements improve the user experience.

Full changelog

Android & iOS Update

Performance Optimizations: Significant improvements to app initialization speed and memory management.
Bug Fixes: General stability enhancements for a smoother user experience.

View release on GitHub

All releases

🔥 What's New: v0.11.0

🔥 Gemma 4 support

Release Notes

🔥 What's New: `v0.11.0`