Skip to content

LiteRT-LM

LLM Frameworks
C++ Latest v0.13.1 · 5h ago Security brief →

Features

  • Cross‑platform deployment (Android, iOS, Web, Desktop, IoT such as Raspberry Pi)
  • Hardware acceleration via GPU and NPU for peak performance
  • Multi‑modality support (vision and audio inputs)
  • Tool‑use / function calling for agentic workflows

Recent releases

View all 7 releases →
No immediate action
v0.13.0 New feature

Agent skill + CLI + macOS Swift

No immediate action
v0.12.0 New feature

Swift, Web JS, CLI NPU, Flutter

v0.11.0 New feature
Notable features
  • Windows Native Support: LiteRT-LM CLI runs natively on Windows with CPU and GPU backends
Full changelog

🔥 What's New: v0.11.0

  • Gemma 4 Multi-token Prediction (MTP) Support: Supercharge Gemma 4 on-device inference with Single Position Multi Token Prediction (MTP), delivering >2x faster decode speeds on mobile GPUs with zero quality degradation (blog, documentation).

  • Windows Native Support: The LiteRT-LM CLI now runs natively on Windows with both CPU and GPU backend support.

v0.10.2 Feature
Notable features
  • Improved UI smoothness
Changelog
  • Various Bug fixes
  • Improve the UI smoothness
v0.10.1 New feature
Notable features
  • Support for deploying and running Gemma 4 across Linux, macOS, Windows (WSL) and Raspberry Pi
  • Migrated CLI from `fire` to `click`, adding `--verbose`, `--version`, improved help formatting and styled terminal output
  • Added direct Hugging Face model import with auto‑conversion for missing models during `run`
Full changelog

🔥 Gemma 4 support

Deploy Gemma 4 across a broad range of hardware with stellar performance (blog).

👉 Try on Linux, macOS, Windows (WSL) or Raspberry Pi with the
LiteRT-LM CLI:

litert-lm run  \
   --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
   gemma-4-E2B-it.litertlm \
   --prompt="What is the capital of France?"

Release Notes

  • CLI Enhancements & Migration: Migrated the CLI from fire to click, adding features like --verbose, --version, improved help formatting, and enhanced terminal output styling (#1784, #1733, #1791, #1792).
  • Hugging Face Integration: Added support for importing models directly from Hugging Face and implemented auto-conversion for missing models during "run" commands (#1797, #1735).
  • Core Performance & Features: Introduced a LiteRT-based KV cache implementation, speculative decoding support, and improved context merging for conversation history (#1601, #1793, #1742).
  • Platform & Build Improvements: Refactored CMake for better Android/cross-compilation support, updated the Windows build with a CPU sampler workaround, and transitioned nightly releases to Ubuntu-22.04 (#1741, #1734, #1772).
  • API & Documentation: Expanded the Kotlin API for response channel configuration and launched new Python API resources, including a "Getting Started" guide and a Colab notebook (#1724, #1737, #1757).

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
5,303
Forks
538
Languages
C++ Python CMake

Install & Platforms

Install via
binary
Platforms
linux macos windows arm64
Mobile
Android IOS

Beta — feedback welcome: [email protected]