Skip to content

tensorzero

Model Serving & MLOps

Open-source LLMOps platform with unified LLM gateway, observability, evaluation, and optimization for cost and performance

Rust Latest 2026.5.2 · 14d ago Security brief →

Features

  • Unified API access to 25+ LLM providers through single gateway
  • Full-featured inference: tool use, structured JSON outputs, batch, embeddings, multimodal, caching
  • Evaluation and optimization: benchmark, optimize prompts and models with LLM judges
  • A/B testing, routing, fallbacks, and automatic retries
  • Complete observability with feedback collection and cost tracking

Recent releases

View all 64 releases →
No immediate action
2026.5.2 New feature

Stop param flexibility + OpenInference attributes

No immediate action
2026.5.1 Bugfix

SSE decoding errors

2026.5.0 Breaking risk
Breaking changes
  • UI requires authentication when the gateway requires authentication (previously only for gateway usage).
Notable features
  • Improved error handling and logging for complex streaming inferences, including status code propagation and fallbacks.
Full changelog

[!CAUTION]
Breaking Changes

  • The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.

New Features

  • Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).

& multiple under-the-hood and UI improvements (thanks @arisp)

2026.4.1 Breaking risk
⚠ Upgrade required
  • Deprecation: TensorZero Autopilot "Sessions" page removed from UI; future platform‑agnostic workflows planned.
Breaking changes
  • Gateway defaults to async observability writes; previous synchronous behavior requires `observability.async_writes = false`.
Notable features
  • TypeScript evaluators for inference evaluations
  • Support for vLLM's new `reasoning` field
  • Aggregated variant usage data (tokens, cost) in UI
Full changelog

[!CAUTION]
Breaking Changes

  • The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set observability.async_writes = false. [docs]

[!WARNING]
Deprecations

  • Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.

Bug Fixes

  • Return HTTP code 429 for rate limiting errors.
  • Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)

New Features

  • Added TypeScript evaluators (for inference evaluations).
  • Added support for vLLM's new reasoning field.
  • Added aggregated variant usage data (tokens, cost, etc.) to the UI.
  • Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
  • Added export.otlp.traces.include_content (default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.

& multiple under-the-hood and UI improvements

2026.4.0 New feature
Notable features
  • Add MCP server to gateway exposing API at /mcp
  • Report provider prompt caching statistics via API and UI
  • Report usage statistics (tokens, latency, cost) for inference evaluations via CLI, API, and UI
Full changelog

New Features

  • Add an MCP server to the gateway exposing its API in /mcp.
  • Report provider prompt caching statistics via API and UI.
  • Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI.
  • Add the Prometheus metrics tensorzero_input_tokens_total and tensorzero_output_tokens_total.
  • Add configuration field content_type_overrides to handle file inputs for long-tail providers.

& multiple under-the-hood and UI improvements

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
11,428
Forks
835
Languages
Rust TypeScript Python
Downloads/week
3
NPM Maintainers
2
Contributors
119

Community & Support

Beta — feedback welcome: [email protected]