tensorzero

Model Serving & MLOps Archived

An open‑source LLMOps platform that provides a unified LLM gateway, observability, evaluation, optimization, and experimentation features.

This repository is archived and no longer actively maintained.

Track releases GitHub Website

Rust Latest 2026.6.0 · 1mo ago Security brief →

Features

Unified LLM gateway for calling any major provider through a single API with sub‑millisecond latency
Observability: store inferences and feedback programmatically or via UI dashboards
Evaluation & optimization tools to benchmark, tune prompts, models, and inference strategies
Experimentation capabilities including A/B testing, routing, fallbacks, retries

Recent releases

View all 65 releases →

Upgrade now

2026.6.0 Security 1mo

Gateway vulnerability fix

Open

No immediate action

2026.5.2 New feature 2mo

Stop param flexibility + OpenInference attributes

Open

No immediate action

2026.5.1 Bugfix 2mo

SSE decoding errors

Open

2026.5.0 Breaking risk 2mo

Breaking changes

UI requires authentication when the gateway requires authentication (previously only for gateway usage).

Notable features

Improved error handling and logging for complex streaming inferences, including status code propagation and fallbacks.

Full changelog

[!CAUTION]
Breaking Changes

The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.

New Features

Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).

& multiple under-the-hood and UI improvements (thanks @arisp)

View release on GitHub

2026.4.1 Breaking risk 3mo

⚠ Upgrade required

Deprecation: TensorZero Autopilot "Sessions" page removed from UI; future platform‑agnostic workflows planned.

Breaking changes

Gateway defaults to async observability writes; previous synchronous behavior requires `observability.async_writes = false`.

Notable features

TypeScript evaluators for inference evaluations
Support for vLLM's new `reasoning` field
Aggregated variant usage data (tokens, cost) in UI

Full changelog

[!CAUTION]
Breaking Changes

The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set observability.async_writes = false. [docs]

[!WARNING]
Deprecations

Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.

Bug Fixes

Return HTTP code 429 for rate limiting errors.
Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)

New Features

Added TypeScript evaluators (for inference evaluations).
Added support for vLLM's new reasoning field.
Added aggregated variant usage data (tokens, cost, etc.) to the UI.
Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
Added export.otlp.traces.include_content (default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.

& multiple under-the-hood and UI improvements

View release on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.