tensorzero
Model Serving & MLOpsOpen-source LLMOps platform with unified LLM gateway, observability, evaluation, and optimization for cost and performance
Features
- Unified API access to 25+ LLM providers through single gateway
- Full-featured inference: tool use, structured JSON outputs, batch, embeddings, multimodal, caching
- Evaluation and optimization: benchmark, optimize prompts and models with LLM judges
- A/B testing, routing, fallbacks, and automatic retries
- Complete observability with feedback collection and cost tracking
Recent releases
View all 64 releases →- UI requires authentication when the gateway requires authentication (previously only for gateway usage).
- Improved error handling and logging for complex streaming inferences, including status code propagation and fallbacks.
Full changelog
[!CAUTION]
Breaking Changes
- The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.
New Features
- Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).
& multiple under-the-hood and UI improvements (thanks @arisp)
- Deprecation: TensorZero Autopilot "Sessions" page removed from UI; future platform‑agnostic workflows planned.
- Gateway defaults to async observability writes; previous synchronous behavior requires `observability.async_writes = false`.
- TypeScript evaluators for inference evaluations
- Support for vLLM's new `reasoning` field
- Aggregated variant usage data (tokens, cost) in UI
Full changelog
[!CAUTION]
Breaking Changes
- The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set
observability.async_writes = false. [docs]
[!WARNING]
Deprecations
- Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.
Bug Fixes
- Return HTTP code 429 for rate limiting errors.
- Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)
New Features
- Added TypeScript evaluators (for inference evaluations).
- Added support for vLLM's new
reasoningfield. - Added aggregated variant usage data (tokens, cost, etc.) to the UI.
- Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
- Added
export.otlp.traces.include_content(default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.
& multiple under-the-hood and UI improvements
- Add MCP server to gateway exposing API at /mcp
- Report provider prompt caching statistics via API and UI
- Report usage statistics (tokens, latency, cost) for inference evaluations via CLI, API, and UI
Full changelog
New Features
- Add an MCP server to the gateway exposing its API in
/mcp. - Report provider prompt caching statistics via API and UI.
- Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI.
- Add the Prometheus metrics
tensorzero_input_tokens_totalandtensorzero_output_tokens_total. - Add configuration field
content_type_overridesto handle file inputs for long-tail providers.
& multiple under-the-hood and UI improvements
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.