Skip to content

ypollak2/llm-router

v8.3.0 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 21d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-routing anthropic claude claude-code cost-optimization gemini
+7 more
litellm llm llm-router mcp-server model-router ollama openai

ReleasePort's take

Light signal
editorial:auto 13d

LLM Router v8.3.0 automatically compresses conversation context for paid models, reducing token usage by up to 50%.

Why it matters: Enables operators to cut token consumption in half for paid model calls; configure with LLM_ROUTER_CONTEXT_OPTIMIZER to enable compression.

Summary

AI summary

Automatically compresses conversation context before sending to paid models, saving up to 50% tokens.

Changes in this release

Feature Medium

Two-stage compression pipeline: structural and recency-based optimization.

Two-stage compression pipeline: structural and recency-based optimization.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Free models automatically skip context compression.

Free models automatically skip context compression.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Context savings metrics display in routing footer.

Context savings metrics display in routing footer.

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

LLM_ROUTER_CONTEXT_OPTIMIZER environment variable controls compression.

LLM_ROUTER_CONTEXT_OPTIMIZER environment variable controls compression.

Source: llm_adapter@2026-05-21

Confidence: low

Feature Low

Shows context token savings in the routing footer output.

Shows context token savings in the routing footer output.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Feature Low

Adds LLM_ROUTER_CONTEXT_OPTIMIZER env var to enable/disable/adjust compression.

Adds LLM_ROUTER_CONTEXT_OPTIMIZER env var to enable/disable/adjust compression.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Performance Medium

Automatically compresses conversation context before sending to paid models.

Automatically compresses conversation context before sending to paid models.

Source: llm_adapter@2026-05-21

Confidence: high

Full changelog

What's New

Automatically compresses conversation context before sending to paid models. Zero latency, 20-50% fewer context tokens.

How It Works

2-stage pipeline (pure Python, no LLM calls):

  1. Structural — collapses whitespace, removes code comments, deduplicates repeated blocks
  2. Recency — keeps last 2 exchanges verbatim, truncates older messages, drops old code blocks

Free models (Ollama, Codex, Gemini CLI) skip compression automatically.

What You See

Context savings now appear in the routing footer:

→ gemini-2.5-flash · simple · $0.0002 (43x cheaper) | ctx 1500→920tok (39% saved)

Configuration

export LLM_ROUTER_CONTEXT_OPTIMIZER=auto   # default — Stage 1+2
export LLM_ROUTER_CONTEXT_OPTIMIZER=off    # disable

Upgrade

pip install --upgrade llm-routing

Full Changelog: https://github.com/ypollak2/llm-router/compare/v8.2.0...v8.3.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track ypollak2/llm-router

Get notified when new releases ship.

Sign up free

About ypollak2/llm-router

Subscription-aware LLM router for Claude Code. Routes tasks to 20+ providers (OpenAI, Gemini, Groq, Ollama, Codex) based on complexity classification, Claude subscription pressure, and cost. Free tasks stay on Claude subscription; expensive tasks fall back to the cheapest capable model. Includes 30 MCP tools, 6 auto-routing hooks, semantic dedup cache, prompt caching, daily spend cap, and a live web dashboard.

All releases →

Related context

Earlier breaking changes

  • v9.2.0 Changes auto‑route directive from advisory "DO NOT SKIP" to hard constraint with explicit blocked tools list.
  • v9.2.0 Breaks permanent downgrade of enforcement after first Edit/Write; v13 now requires per‑turn routing.

Beta — feedback welcome: [email protected]