Skip to content

Release history

LiteRT-LM releases

All releases

7 shown

No immediate action
v0.13.1 Maintenance

Routine maintenance and dependency updates.

No immediate action
v0.13.0 New feature

Agent skill + CLI + macOS Swift

No immediate action
v0.12.0 New feature

Swift, Web JS, CLI NPU, Flutter

v0.11.0 New feature
Notable features
  • Windows Native Support: LiteRT-LM CLI runs natively on Windows with CPU and GPU backends
Full changelog

🔥 What's New: v0.11.0

  • Gemma 4 Multi-token Prediction (MTP) Support: Supercharge Gemma 4 on-device inference with Single Position Multi Token Prediction (MTP), delivering >2x faster decode speeds on mobile GPUs with zero quality degradation (blog, documentation).

  • Windows Native Support: The LiteRT-LM CLI now runs natively on Windows with both CPU and GPU backend support.

v0.10.2 Feature
Notable features
  • Improved UI smoothness
Changelog
  • Various Bug fixes
  • Improve the UI smoothness
v0.10.1 New feature
Notable features
  • Support for deploying and running Gemma 4 across Linux, macOS, Windows (WSL) and Raspberry Pi
  • Migrated CLI from `fire` to `click`, adding `--verbose`, `--version`, improved help formatting and styled terminal output
  • Added direct Hugging Face model import with auto‑conversion for missing models during `run`
Full changelog

🔥 Gemma 4 support

Deploy Gemma 4 across a broad range of hardware with stellar performance (blog).

👉 Try on Linux, macOS, Windows (WSL) or Raspberry Pi with the
LiteRT-LM CLI:

litert-lm run  \
   --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
   gemma-4-E2B-it.litertlm \
   --prompt="What is the capital of France?"

Release Notes

  • CLI Enhancements & Migration: Migrated the CLI from fire to click, adding features like --verbose, --version, improved help formatting, and enhanced terminal output styling (#1784, #1733, #1791, #1792).
  • Hugging Face Integration: Added support for importing models directly from Hugging Face and implemented auto-conversion for missing models during "run" commands (#1797, #1735).
  • Core Performance & Features: Introduced a LiteRT-based KV cache implementation, speculative decoding support, and improved context merging for conversation history (#1601, #1793, #1742).
  • Platform & Build Improvements: Refactored CMake for better Android/cross-compilation support, updated the Windows build with a CPU sampler workaround, and transitioned nightly releases to Ubuntu-22.04 (#1741, #1734, #1772).
  • API & Documentation: Expanded the Kotlin API for response channel configuration and launched new Python API resources, including a "Getting Started" guide and a Colab notebook (#1724, #1737, #1757).
v0.9.0 Bugfix

General stability enhancements improve the user experience.

Full changelog

Android & iOS Update

  • Performance Optimizations: Significant improvements to app initialization speed and memory management.

  • Bug Fixes: General stability enhancements for a smoother user experience.

Beta — feedback welcome: [email protected]