tensorzero releases - releaseport

Upgrade now

2026.6.0 Security 1mo

Gateway vulnerability fix

Open

No immediate action

2026.5.2 New feature 2mo

Stop param flexibility + OpenInference attributes

Open

No immediate action

2026.5.1 Bugfix 2mo

SSE decoding errors

Open

2026.5.0 Breaking risk 2mo

Breaking changes

UI requires authentication when the gateway requires authentication (previously only for gateway usage).

Notable features

Improved error handling and logging for complex streaming inferences, including status code propagation and fallbacks.

Full changelog

[!CAUTION]
Breaking Changes

The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.

New Features

Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).

& multiple under-the-hood and UI improvements (thanks @arisp)

View release on GitHub

2026.4.1 Breaking risk 3mo

⚠ Upgrade required

Deprecation: TensorZero Autopilot "Sessions" page removed from UI; future platform‑agnostic workflows planned.

Breaking changes

Gateway defaults to async observability writes; previous synchronous behavior requires `observability.async_writes = false`.

Notable features

TypeScript evaluators for inference evaluations
Support for vLLM's new `reasoning` field
Aggregated variant usage data (tokens, cost) in UI

Full changelog

[!CAUTION]
Breaking Changes

The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set observability.async_writes = false. [docs]

[!WARNING]
Deprecations

Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.

Bug Fixes

Return HTTP code 429 for rate limiting errors.
Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)

New Features

Added TypeScript evaluators (for inference evaluations).
Added support for vLLM's new reasoning field.
Added aggregated variant usage data (tokens, cost, etc.) to the UI.
Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
Added export.otlp.traces.include_content (default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.

& multiple under-the-hood and UI improvements

View release on GitHub

2026.4.0 New feature 3mo

Notable features

Add MCP server to gateway exposing API at /mcp
Report provider prompt caching statistics via API and UI
Report usage statistics (tokens, latency, cost) for inference evaluations via CLI, API, and UI

Full changelog

New Features

Add an MCP server to the gateway exposing its API in /mcp.
Report provider prompt caching statistics via API and UI.
Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI.
Add the Prometheus metrics tensorzero_input_tokens_total and tensorzero_output_tokens_total.
Add configuration field content_type_overrides to handle file inputs for long-tail providers.

& multiple under-the-hood and UI improvements

View release on GitHub

2026.3.4 Breaking risk 4mo

⚠ Upgrade required

Deprecation: Inference evaluation config must be nested under function names; legacy flat format will be removed in a future release.
Deprecation: `launch_optimization` with `GEPAConfig` is deprecated and will be removed; use `t0.optimization.gepa.launch` instead.

Notable features

TensorZero Autopilot: automated AI engineer that analyzes LLM data, configures evaluations, optimizes prompts/models, and runs A/B tests
Embeddings requests now counted in Prometheus metrics `tensorzero_requests_total` and `tensorzero_inferences_total`
Observability configuration field `observability.batch_writes.write_queue_capacity` added for gateway backpressure

Full changelog

[!WARNING]
Planned Deprecations
The configuration for inference evaluations should be nested under the relevant functions moving forward [docs]. You can run evaluations by providing a function name and a list of evaluators. The legacy format will be removed in a future release.
[functions.write_haiku.evaluators.exact_match]
type = "exact_match"
The legacy implementation of GEPA (launch_optimization with GEPAConfig) will be removed in a future release. Please use t0.optimization.gepa.launch instead. [docs]

Bug Fixes

Fixed a UI bug where a custom gateway base_path was not handled correctly in certain routes. (thanks @wangfenjin!)

New Features

Started including embeddings requests in the Prometheus metrics tensorzero_requests_total and tensorzero_inferences_total.
Added the configuration field observability.batch_writes.write_queue_capacity to enable backpressure for observability data in the gateway.

& multiple under-the-hood and UI improvements (thanks @majiayu000)!

[!IMPORTANT]

🆕 TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests.

It dramatically improves the performance of LLM agents across diverse tasks:

Learn more → Schedule a demo →

View release on GitHub

2026.3.3 Breaking risk 4mo

Breaking changes

Removed assistant message prefill for JSON functions with Anthropic (deprecated by Anthropic).

Notable features

GEPA automated prompt engineering via durable workflows
Support duplicate tool calls in `all_of` evaluators for parallel execution
UI option to set expiration date for API keys

Full changelog

Bug Fixes

Fixed two edge cases affecting batch inference.
Fixed a UI bug affecting "Try with..." with inputs that include base64 files.
Removed assistant message prefill for JSON functions + Anthropic (deprecated by Anthropic).

New Features

Added an implementation of GEPA (automated prompt engineering) based on durable workflows.
Allow users to specify duplicate tool calls in all_of tool evaluators to evaluate parallel tool calling.
Allow users to specify an expiration date for API keys in the UI. (thanks @eibrahim95)
Allow users to specify object_storage.endpoint = "env::MY_ENV_VAR" in addition to static values. (thanks @Meredith2328)

& multiple under-the-hood and UI improvements (thanks @majiayu000)!

View release on GitHub

2026.3.2 Bug fix 4mo

Notable features

Postgres added as an alternative observability backend to ClickHouse (recommended for low RPS)
`openrouter::xxx` shorthand for embedding models
Per-session API keys in the browser when auth is enabled

Full changelog

Bug Fixes

Fixed an UI issue that prevented certain pages from rendering when depending on historical configuration.

New Features

Added Postgres as an alternative observability backend to ClickHouse. Postgres is the simplest way to get started; we recommend ClickHouse if you're handling >100 RPS.
Added the openrouter::xxx short-hand for embedding models.
Added support for per-session API keys in the browser (instead of a global environment variable) when auth is enabled.

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.3.1 Breaking risk 4mo

⚠ Upgrade required

The embedded gateway in the TensorZero Python SDK will be removed in version 2026.6+; migrate to a standalone TensorZero Gateway using `base_url` for OpenAI SDK or `build_http` for TensorZero SDK.
The variant configuration field `weight` will be removed in version 2026.6+; transition to the new experimentation configuration semantics documented at https://www.tensorzero.com/docs/experimentation/run-static-ab-tests.

Breaking changes

Removed `model_provider_name` filter for `extra_body` and `extra_headers`; use `model_name` and `provider_name` instead.
Removed legacy experimental `list_inferences` endpoint; use the new endpoint documented at https://www.tensorzero.com/docs/observability/query-historical-inferences.
Removed several long-deprecated types and methods from the TensorZero Python SDK.

Notable features

Added support for launching optimization workflows with `dataset_name` in `launch_optimization_workflow`.

Full changelog

[!WARNING]
Completed Deprecations

Removed the deprecated model_provider_name filter for extra_body and extra_headers. Please use model_name and provider_name instead.

Removed the legacy experimental list_inferences endpoint and method. Please use the new endpoint instead. [docs]

Removed several long-deprecated types and methods from the TensorZero Python SDK.

[!WARNING]
Planned Deprecations

The embedded gateway in the TensorZero Python SDK will be removed in a future release (2026.6+). patch_openai_client and build_embedded are deprecated. Please deploy a standalone TensorZero Gateway instead (usage: base_url for OpenAI SDK; build_http for TensorZero SDK).

The variant configuration field weight will be removed in a future release (2026.6+). Please use the new experimentation configuration semantics. [docs]

Bug Fixes

Fixed a compatibility bug with Valkey-based caching that only affected Redis.

New Features

Added support for launching optimization workflows with dataset_name (instead of an inference query) in launch_optimization_workflow.

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.3.0 Breaking risk 4mo

⚠ Upgrade required

Configuration fields static_weights, track_and_stop will be removed in a future release; see Run adaptive A/B tests and Run static A/B tests docs for updated usage.
Evaluator configuration field cutoff will be removed; use CLI flag --cutoffs evaluator=value,... instead.
Gateway route /variant_sampling_probabilities will be removed in a future release.

Breaking changes

Removed deprecated Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram; use tensorzero_inference_latency_overhead_seconds instead.

Notable features

Added regex and tool_use evaluators.
Added experimental_launch_optimization_workflow to the TensorZero Python SDK.

Full changelog

[!WARNING]
Completed Deprecations

The deprecated Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram was removed. Use tensorzero_inference_latency_overhead_seconds instead.

[!WARNING]
Planned Deprecations

The configuration for experimentation (e.g. static_weights, track_and_stop) was simplified. The old notation will be removed in a future release. See Run adaptive A/B tests and Run static A/B tests for more information.

The evaluator configuration field cutoff will be removed in a future release. Instead, provide --cutoffs evaluator=value,... in the CLI.

The gateway route /variant_sampling_probabilities will be removed in a future release.

The configuration field postgres.enabled will be removed in a future release. Instead, the gateway will consider whether the environment variable TENSORZERO_POSTGRES_URL is set.

New Features

Add regex and tool_use evaluators. [docs]
Add experimental_launch_optimization_workflow to the TensorZero Python SDK.

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.2.2 Breaking risk 5mo

Breaking changes

Removed deprecated legacy dataset management endpoints; use new endpoints for that functionality.
Changed `--config-file` globbing: single‑level wildcards (`*`) no longer match files across directory boundaries; require recursive wildcard (`**`).

Notable features

Cost tracking and cost‑based rate limiting
Namespaces for multiple granular A/B experiments on the same TensorZero function
Improved reasoning support for Anthropic, Fireworks AI, SGLang, Together AI

Full changelog

[!CAUTION]
Breaking Changes

The --config-file globbing behavior has changed: single-level wildcards (*) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (**). This aligns the behavior with standard glob semantics. For example:

--config-file *.toml matches tensorzero.toml, but not subdir/tensorzero.toml.

--config-file **/*.toml matches both tensorzero.toml and subdir/tensorzero.toml.

[!WARNING]
Completed Deprecations

Removed deprecated legacy endpoints for dataset management. The functionality is fully covered by the new endpoints.

New Features

Add cost tracking and cost-based rate limiting.
Add namespaces: the ability to set up multiple granular experiments (A/B tests) for the same TensorZero function.
Improve reasoning support for Anthropic (including adaptive thinking), Fireworks AI, SGLang, and Together AI.
Allow users to whitelist automatic tool approvals for TensorZero Autopilot.
Report provider errors when include_raw_response is enabled.
Add include_aggregated_response to streaming inferences. When enabled, the final chunk includes an aggregated output aggregated_response that combines previous chunks.
Allow users to kill ongoing evaluation runs from UI.
Allow custom gateway bind addresses with the environment variable TENSORZERO_GATEWAY_BIND_ADDRESS.

& multiple under-the-hood and UI improvements (thanks @Nfemz @greg80303)!

View release on GitHub

2026.2.1 Breaking risk 5mo

Breaking changes

Default value for `cache_options.enabled` changed from `write_only` to `off`.

Notable features

Support reasoning models from Groq, Mistral, and vLLM.
Support multi-turn reasoning with Gemini and OpenAI‑compatible models.
Support embedding models from Together AI.

Full changelog

[!CAUTION]
Breaking Changes

The default value for cache_options.enabled changed from write_only to off.

New Features

Support reasoning models from Groq, Mistral, and vLLM.
Support multi-turn reasoning with Gemini and OpenAI-compatible models.
Support embedding models from Together AI.
Add configurable total_ms timeout to streaming inferences.
Display charts with top-k evaluation results in the TensorZero Autopilot UI.
Add "Ask Autopilot" buttons throughout the UI.
Allow TensorZero Autopilot to edit your local configuration files.
Return thought and unknown content blocks in the OpenAI-compatible endpoint (tensorzero_extra_content).

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.2.0 Breaking risk 5mo

⚠ Upgrade required

`beta_structured_outputs` configuration field is deprecated and ignored; will be removed in a future release.

Notable features

YOLO Mode for TensorZero Autopilot
Interruption feature for TensorZero Autopilot sessions
Summary row added to TensorZero Autopilot session table

Full changelog

[!WARNING]
Planned Deprecations

Anthropic's structured output feature is out of beta, so the TensorZero configuration field beta_structured_outputs is now ignored and deprecated. It'll be removed in a future release.

Bug Fixes

Fix a regression in the aws_bedrock provider that affected long-term bearer API keys.
Fix a horizontal overflow issue for tool calls and results in the inference detail UI page.

New Features

Add YOLO Mode for TensorZero Autopilot.
Add interruption feature for TensorZero Autopilot sessions.
Add summary to the TensorZero Autopilot session table in the UI.

& multiple under-the-hood and UI improvements (thanks @pratikbuilds)!

View release on GitHub

2026.1.8 Bugfix 5mo

Fixed a race condition disabling chat input in TensorZero Autopilot UI.

Full changelog

Bug Fixes

Fix a race condition in the TensorZero Autopilot UI that could disable the chat input.
Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations).

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.1.7 New feature 5mo

Notable features

TensorZero Autopilot (preview) – automated AI engineer for LLM observability, prompt/model optimization, eval setup, and A/B testing
Support multi-turn reasoning for xAI via `reasoning_content`

Full changelog

New Features

[Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more → Join the waitlist →
Support multi-turn reasoning for xAI (reasoning_content only).

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.1.6 Breaking risk 5mo

⚠ Upgrade required

When using `unstable_error_json` with the OpenAI‑compatible inference endpoint, replace `error_json` with `tensorzero_error_json`. Both fields are currently emitted with identical data; future releases will remove `error_json`.

Breaking changes

OpenAI-compatible endpoints return errors in the standard OpenAI format (`{"error": {"message": "..."}}`) instead of the previous TensorZero format (`{"error": "..."}`).

Notable features

Native support for provider tools (e.g., web search) added to Anthropic and GCP Vertex AI Anthropic model providers.
Improved streaming handling of reasoning content blocks in OpenAI Responses API.
Graceful handling of missing `usage` fields during inference with the OpenAI model provider.

Full changelog

[!CAUTION]
Breaking Changes

Moving forward, TensorZero will use the OpenAI API's error format ({"error": {"message": "Bad!"}) instead of TensorZero's error format ({"error": "Bad!"}) in the OpenAI-compatible endpoints.

[!WARNING]
Planned Deprecations

When using unstable_error_json with the OpenAI-compatible inference endpoint, use tensorzero_error_json instead of error_json. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected.

New Features

Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use extra_body to handle these tools.
Improve handling of reasoning content blocks when streaming with the OpenAI Responses API.
Handle inferences with missing usage fields gracefully in the OpenAI model provider.
Improve error handling across the UI.

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.1.5 Breaking risk 6mo

⚠ Upgrade required

Migrate `include_original_response` to `include_raw_response` in all SDK configurations.
Update AWS model provider settings: replace `allow_auto_detect_region = true` with `region = "sdk"`.
When configuring custom Anthropic providers, set `api_base` to the base URL without the trailing endpoint (e.g., remove `/messages`).

Breaking changes

Normalized `usage` reporting: `input_tokens` and `output_tokens` now include all provider token variations (caching, reasoning, etc.), while cached tokens remain excluded. Raw provider usage can be accessed via `include_raw_usage`.
Deprecation of `include_original_response`; migrate to `include_raw_response` for full model inference metadata.
Deprecation of `allow_auto_detect_region = true`; migrate to `region = "sdk"` when configuring AWS model providers.

Notable features

Improved error handling across TensorZero UI, JSON deserialization, AWS providers, streaming inferences, and timeouts.
Support for Valkey (Redis) to enhance rate‑limiting performance at ≥100 QPS.
Added `reasoning_effort` support for Gemini 3 models (mapped to `thinkingLevel`).

Full changelog

[!CAUTION]
Breaking Changes

TensorZero will normalize the reported usage from different model providers. Moving forward, input_tokens and output_tokens include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with include_raw_usage.

[!WARNING]
Planned Deprecations

Migrate include_original_response to include_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata.

Migrate allow_auto_detect_region = true to region = "sdk" when configuring AWS model providers. The behavior is identical.

Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:

Before: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages"

Now: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"

Bug Fixes

Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.

New Features

Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
Support reasoning_effort for Gemini 3 models (mapped to thinkingLevel).
Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward, json_mode = "strict" will use the beta structured outputs feature; json_mode = "on" still uses the legacy assistant message prefill.
Improve handling of reasoning content in the OpenRouter and xAI model providers.
Add extra_headers support for embedding models. (thanks @jonaylor89!)
Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.

& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!

View release on GitHub

2026.1.2 New feature 6mo

Notable features

Append to arrays using `/my_array/-` with `extra_body`.
Handle cross‑model thought signatures in GCP Vertex AI Gemini and Google AI Studio.

Full changelog

New Features

Support appending to arrays with extra_body using the /my_array/- notation.
Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.

& multiple under-the-hood and UI improvements (thanks @ecalifornica!)

View release on GitHub

2026.1.1 Bug fix 6mo

⚠ Upgrade required

Deprecation warning: In a future release, `model` will become required in DICLOptimizationConfig initialization (currently optional with default openai::gpt-5-mini).

Notable features

Support stream_options.include_usage for every model under the Azure provider

Full changelog

[!WARNING]
Planned Deprecations

In a future release, the parameter model will be required when initializing DICLOptimizationConfig. The parameter remains optional (defaults to openai::gpt-5-mini) in the meantime.

Bug Fixes

Stop buffering raw_usage when streaming with the OpenAI-compatible inference endpoint; instead, emit raw_usage as soon as possible, just like in the native endpoint.
Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.

New Features

Support stream_options.include_usage for every model under the Azure provider.

& multiple under-the-hood and UI improvements!

View release on GitHub

2026.1.0 Breaking risk 6mo

⚠ Upgrade required

Update monitoring dashboards and alerts to use the new histogram buckets for `tensorzero_inference_latency_overhead_seconds`.
Replace usage of deprecated environment variable `TENSORZERO_CLICKHOUSE_URL` with gateway‑mediated queries.
Adjust configuration: migrate from `tensorzero_inference_latency_overhead_seconds_histogram_buckets` to `tensorzero_inference_latency_overhead_seconds_buckets`.

Breaking changes

Metric `tensorzero_inference_latency_overhead_seconds` changed from a summary to a histogram (default buckets: 1ms, 10ms, 100ms).
Deprecation of environment variable `TENSORZERO_CLICKHOUSE_URL` in the UI.
Renamed Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` to `tensorzero_inference_latency_overhead_seconds` (both emitted temporarily).

Notable features

Optional `include_raw_usage` parameter in inference requests returns raw usage objects alongside normalized usage.
Optional `--bind-address` CLI flag added to the gateway.
Optional `description` field for metrics in configuration.

Full changelog

[!CAUTION]
Breaking Changes

The Prometheus metric tensorzero_inference_latency_overhead_seconds will report a histogram instead of a summary. You can customize the buckets using gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets in the configuration (default: 1ms, 10ms, 100ms).

[!WARNING]
Planned Deprecations

Deprecate the TENSORZERO_CLICKHOUSE_URL environment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse.

Rename the Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram to tensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now.

Rename the configuration field tensorzero_inference_latency_overhead_seconds_histogram_buckets to tensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.

New Features

Add optional include_raw_usage parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized usage response field.
Add optional --bind-address CLI flag to the gateway.
Add optional description field to metrics in the configuration.
Add option to fine-tune Fireworks models without automatic deployment.

& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!

View release on GitHub

2025.12.6 Breaking risk 7mo

Breaking changes

Removed `credential_location` from `DICLOptimizationConfig`.
Moved `account_id` to `[provider_types.fireworks.sft]` and removed `api_base` and `credential_location` from `FireworksSFTConfig`.
Moved `bucket_name`, `bucket_path_prefix`, `kms_key_name`, `project_id`, `region`, and `service_account` to `[provider_types.gcp_vertex_gemini.sft]` and removed them from `GCPVertexGeminiSFTConfig`.

Notable features

Gateway relay support for routing LLM inference requests through multiple TensorZero Gateway deployments.
Added "Try with model" button to datapoint page in the UI.
Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` for meta‑observability.

Full changelog

[!CAUTION]
Breaking Changes

Migrated the following optimization fields from the TensorZero Python SDK to the configuration:

DICLOptimizationConfig: removed credential_location.

FireworksSFTConfig: moved account_id to [provider_types.fireworks.sft]; removed api_base and credential_location.

GCPVertexGeminiSFTConfig: moved bucket_name, bucket_path_prefix, kms_key_name, project_id, region, and service_account to to [provider_types.gcp_vertex_gemini.sft].

OpenAIRFTConfig: removed api_base and credential_location.

OpenAISFTConfig: removed api_base and credential_location.

TogetherSFTConfig: hf_api_token, wandb_api_key, wandb_base_url, and wandb_project_name moved to [provider_types.together.sft]; removed api_base and credential_location.

New Features

Support gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features.
Add "Try with model" button to the datapoint page in the UI.
Add tensorzero_inference_latency_overhead_seconds_histogram Prometheus metric for meta-observability.
Add concurrency parameter to experimental_render_samples (defaults to 100).
Add otlp_traces_extra_attributes and otlp_traces_extra_resources to the TensorZero Python SDK. (thanks @jinnovation!)

& multiple under-the-hood and UI improvements (thanks @ecalifornica)

View release on GitHub

2025.12.5 Breaking risk 7mo

⚠ Upgrade required

The `experimental_chain_of_thought` variant type will be deprecated in version 2026.2+; migrate to native reasoning capabilities.
The `timeout_s` configuration field for best/mixture-of-N variants will be deprecated in version 2026.2+; use the `[timeouts]` block instead.

Notable features

UI dataset builder supports complex queries (filter by tags, feedback)
Export Prometheus metric tensorzero_inference_latency_overhead_seconds
CLI flag --disable-api-key to disable TensorZero API keys

Full changelog

[!WARNING]
Planned Deprecations

The variant type experimental_chain_of_thought will be deprecated in 2026.2+. As reasoning models are becoming prevalent, please use their native reasoning capabilities.

The timeout_s configuration field for best/mixture-of-N variants will be deprecated in 2026.2+. Please use the [timeouts] block in the configuration for their candidates instead.

New Features

Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback).
Export tensorzero_inference_latency_overhead_seconds Prometheus metric for meta-observability.
Allow users to disable TensorZero API keys using --disable-api-key in the CLI. (thanks @jinnovation!)

& multiple under-the-hood and UI improvements (thanks @ecalifornica)!

View release on GitHub

2025.12.3 Bug fix 7mo

Notable features

Performance improvement for inference and datapoint list pages in the UI
Support filtering inferences by presence of a demonstration

Full changelog

Bug Fixes

Fix a bug where negative tag filters (e.g. user_id != 1) matched inferences and datapoints without that tag.
Fix a bug where metric filters covering default values (e.g. exact_match = false) matched inferences without that metric.
Fix a regression affecting the logger in the UI.

New Features

Improve the performance of the inference and datapoint list pages in the UI.
Support filtering inferences by whether they have a demonstration.

& multiple under-the-hood and UI improvements (thanks @jinnovation @ecalifornica @simeonlee)!

View release on GitHub

2025.12.2 Bug fix 7mo

Notable features

Customizable log level via TENSORZERO_UI_LOG_LEVEL

Full changelog

Bug Fixes

Fix a performance regression affecting the inference table in the UI.

New Features

Allow users to customize the log level in the UI (TENSORZERO_UI_LOG_LEVEL).

& multiple under-the-hood and UI improvements

View release on GitHub

2025.12.1 Bugfix 7mo

Fixed regression that broke the dataset builder in the UI.

Full changelog

Bug Fixes

Fixed a regression that broke the dataset builder in the UI.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.12.0 Breaking risk 7mo

⚠ Upgrade required

Environment variables `TENSORZERO_UI_CONFIG_PATH` and `TENSORZERO_UI_DEFAULT_CONFIG` are deprecated and ignored.
`model_provider_name` is still accepted in the API but will be removed in a future release; migrate to using `model_name` and `provider_name`.

Breaking changes

Unknown content blocks now return `model_name` and `provider_name` instead of fully-qualified `model_provider_name`.

Notable features

Free‑form search and filtering in inference and datapoint tables
Create, edit, clone datapoints directly from the UI
Peek at inferences on episode detail pages

Full changelog

[!CAUTION]
Breaking Changes

Unknown content blocks now return the scope as model_name and provider_name instead of the fully-qualified model_provider_name.

[!WARNING]
Planned Deprecations

The TensorZero UI now reads the configuration from the gateway (instead of reading directly from the filesystem). The environment variables TENSORZERO_UI_CONFIG_PATH and TENSORZERO_UI_DEFAULT_CONFIG are deprecated and ignored. You no longer need to mount the configuration onto the UI container.

Use model_name and provider_name to scope provider tools (e.g. OpenAI Responses API web search) instead of model_provider_name. The deprecated name is still accepted in the API.

Bug Fixes

Fix a regression in the "Try with..." modal in the UI that disregarded some parameters (e.g. allowed_tools).
Fix a regression in allowed_tools when using custom display names for tools.
Fix an edge case when using both allowed_tools and tool_choice parameters with GCP Vertex AI Gemini.

New Features

Support free-form search and filtering (e.g. by tags, metrics) the inference and datapoint tables in the UI.
Support creating datapoints from scratch in the UI.
Support editing TensorZero API key descriptions in the UI (thanks @nicoestrada!).
Support editing any kind of datapoint input and output in the UI.
Support peeking at inferences in the episode detail page in the UI (thanks @BrianLi23!).
Support cloning datapoints in the UI.
Optimize the rendering performance of the code editor in the UI.
Make mime_type optional for base64 file inputs (now inferred from magic bytes when not provided).

& multiple under-the-hood and UI improvements

View release on GitHub

2025.11.6 Bug fix 8mo

Notable features

Programmatic evaluations on specific datapoints via `datapoint_ids`
Generation of `values.schema.json` for the Helm chart

Full changelog

Bug Fixes

Handle a regression in ClickHouse latest that affected the endpoint for deleting datapoints.

New Features

Support running evaluations programmatically on specific datapoints (datapoint_ids).
Generate values.schema.json for the Helm chart. (thanks @Erin-Boehmer!)

View release on GitHub

2025.11.5 Breaking risk 8mo

⚠ Upgrade required

Rename `json_mode="implicit_tool"` to `json_mode="tool"`.
Use `model_name` (and optionally `provider_name`) instead of `model_provider_name` in `extra_body` and `extra_headers` objects supplied at inference time; scope filters are optional.

Breaking changes

Explicit `tensorzero::params` take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint.

Notable features

Native support for Anthropic's Beta Structured Outputs (`beta_structured_outputs`) without needing `extra_headers`
`json_mode="tool"` now supported in chat inferences even when no tools are included
Thought signatures added for GCP Vertex model providers

Full changelog

[!CAUTION]
Breaking Changes

Moving forward, explicit tensorzero::params will take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint.

[!WARNING]
Planned Deprecations

Rename json_mode="implicit_tool" to json_mode="tool".

Set model_name and optionally provider_name instead of model_provider_name in extra_body and extra_headers objects supplied at inference time. Alternatively, don't include a scope filter at all.

New Features

Support Anthropic's Beta Structured Outputs feature natively (beta_structured_outputs). extra_headers is no longer necessary.
Support json_mode="tool" in chat inferences that don't otherwise include tools.
Support extra_body and extra_headers supplied at inference time without scope filters.
Support extra_body and extra_headers supplied at inference time with model_name and optional provider_name scope filters.
Support thought signatures for the GCP Vertex model providers.
Support custom tools for the OpenAI model provider.
Add description fields to evaluation and evaluator configuration.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.11.4 Breaking risk 8mo

⚠ Upgrade required

Replace `page_size` with `limit` in observability methods.
Place fields previously nested in `metadata` or `tool_params` at the root when calling PATCH /v1/datasets/{dataset_name}/datapoints or update_datapoints.
Deprecation warning: use `limit` instead of `page_size` for programmatic observability methods (will be removed in a future release).

Breaking changes

Require `allowed_tools` to include any dynamically specified tools; previously assumed always allowed.

Notable features

Adaptive stopping for evaluations in UI and Python SDK
Support explicit `candidate_variants` and `fallback_variants` with uniform sampling
Add `input_audio` content block support across multiple model providers

Full changelog

[!CAUTION]
Breaking Changes

Moving forward, allowed_tools must include dynamic tools (tools specified at inference time rather than in configuration). This matches the OpenAI API behavior. Previously, TensorZero assumed that dynamic tools were always allowed.

[!WARNING]
Planned Deprecations

Use limit instead of page_size with the programmatic observability methods. Previously, the methods mixed these two fields.

Don't nest fields in metadata or tool_params when calling PATCH /v1/datasets/{dataset_name}/datapoints or update_datapoints. Moving forward, please place them in the root.

[!WARNING]
Completed Deprecations

Require template_filesystem_access.base_path when template_filesystem_access.enabled is true.

Removed many deprecated experimental types and methods from the TensorZero Python SDK.

New Features

Add adaptive stopping for evaluations in the UI and Python SDK.
Support explicit candidate_variants and fallback_variants when using uniform sampling.
Support the input_audio content block in the OpenAI-compatible inference endpoint.
Support the input_audio content block in the OpenAI, Azure, GCP Vertex Gemini, Google AI Studio, and OpenRouter model providers.
Add optional filename field for input files.
Move closer to parity between the GCP Vertex Anthropic model provider and the Anthropic model provider.
Expose new observability and dataset management endpoints as methods in the TensorZero Python SDK.
Add optional postgres.enabled field to the configuration.
Handle missing usage information from model providers that don't report it.
Add experimental method for searching inferences programmatically (search_query_experimental).
Add a native OpenRouter embedding model provider.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.11.3 Bugfix 8mo

Fixed handling of user‑defined tags in batch inference.

Full changelog

Bug Fixes

Enable TLS support for Postgres connections.
Fix handling of user-defined tags in batch inference.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.11.2 Breaking risk 8mo

Breaking changes

Gateway attempts `fallback_variants` in order rather than randomly sampling them.

Notable features

Tag inference and feedback with `tensorzero::api_key_public_id` when using auth.
Add POST /v1/datasets/{dataset_name}/datapoints endpoint for creating datapoints.
Introduce `gateway.global_outbound_http_timeout_ms` configuration setting.

Full changelog

[!CAUTION]
Breaking Changes

Moving forward, the gateway will attempt any fallback_variants in order rather than randomly sample them.

Bug Fixes

Fix a bug that prevented some model inferences from being rendered correctly in the UI.
Handle non-image base64 file inputs consistently in the OpenAI-compatible inference endpoint.
Handle raw_response correctly for batch inference with GCP Vertex AI Gemini.

New Features

Apply the tensorzero::api_key_public_id tag to inference and feedback when using auth.
Add updated HTTP endpoint for creating datapoints (POST /v1/datasets/{dataset_name}/datapoints).
Add gateway.global_outbound_http_timeout_ms configuration setting.

& multiple under-the-hood and UI improvements (thanks @omarraf!)

View release on GitHub

2025.11.1 Bug fix 8mo

Notable features

Rate limiting by API key (`api_key_public_id`)
Native `service_tier` parameter for supported providers (Anthropic, Azure, Groq, OpenAI) – removes need for `extra_body`
Native `detail` parameter for input images in Azure, OpenAI, xAI – removes need for `extra_body`

Full changelog

Bug Fixes

Fix a regression that prevented batch inferences from being rendered in the UI.
Handle missing Postgres credentials gracefully in the UI.

New Features

Support rate limiting by API key (api_key_public_id).
Add native service_tier inference parameter (supported providers: Anthropic, Azure, Groq, OpenAI). extra_body is no longer necessary.
Add native detail parameter for input images (supported providers: Azure, OpenAI, xAI). extra_body is no longer necessary.
Add updated HTTP endpoint for querying inferences by ID (POST /v1/inferences/get_inferences).
Add updated HTTP endpoint for querying inferences with filters (POST /v1/inferences/list_inferences).

& multiple under-the-hood and UI improvements

View release on GitHub

2025.11.0 Breaking risk 8mo

⚠ Upgrade required

Update configuration: replace `enable_template_filesystem_access` with `template_filesystem_access.enabled`.

Breaking changes

Removed configuration field `enable_template_filesystem_access`; use `template_filesystem_access.enabled` instead.

Notable features

Automated experimentation (automated A/B testing)
Authentication for TensorZero Gateway with virtual API keys
Native inference parameters: reasoning_effort/thinking_budget_tokens and verbosity

Full changelog

[!WARNING]
Completed Deprecations

Completed the planned deprecation of the configuration field enable_template_filesystem_access in favor of template_filesystem_access.enabled.

Bug Fixes

Handle the global region correctly for GCP Vertex Anthropic.
Fix output format for JSON functions in the new endpoint for updating datapoints (PATCH /v1/{dataset_name}/datapoints). The output field now matches the inference endpoint (an object with a raw field; parsed is ignored and recomputed internally).

New Features

Add automated experimentation feature (automated A/B testing). Docs
Add authentication for the TensorZero Gateway (virtual API keys). Docs
Add native inference parameters to enable reasoning for every supported model provider (reasoning_effort or thinking_budget_tokens depending on the provider). extra_body is no longer necessary.
Add native verbosity inference parameter. extra_body is no longer necessary.
Support token inputs in the embeddings endpoint.
Support input thought content blocks for GCP Vertex Anthropic.
Improve handling of JSON Schemas for GCP Vertex Gemini and Google AI Studio.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.10.9 Breaking risk 8mo

⚠ Upgrade required

Upgrade from the yanked `2025.10.8` release to this version.
Migrate any code using legacy `list_datapoints` or `experimental_list_inferences` to handle new content‑block format.
Update Helm chart values: remove `createLegacyIngress`; use only `tensorzero-gateway` ingress.

Breaking changes

Removed `list_datapoints` and `experimental_list_inferences` API signatures; updated data schema to use structured content blocks (`{"type": "text", ...}`, `{"type": "template", ...}`, `{"type": "file", ...}`).
Helm chart variable `createLegacyIngress` removed; legacy gateway ingress no longer supported.

Notable features

Added HTTP endpoints for datapoint CRUD operations (`GET`, `POST`, `PATCH`, `DELETE`).
UI support to create, update, and delete messages and content blocks in dataset editor.
Emit OpenTelemetry spans for rate‑limiting queries.

Full changelog

[!CAUTION]
Notice on 2025.10.8: We ran into a technical issue during the release process for 2025.10.8 that resulted in a broken build for the TensorZero Python SDK on PyPI. We've yanked that release and recommend upgrading to this version.

[!CAUTION]
Breaking Changes

This release includes small breaking changes to the programmatic observability/dataset APIs (e.g. list_datapoints, experimental_list_inferences) and the underlying data schema. Moving forward, TensorZero will store and return the new format for text ({"type": "text", "text": "..."}), template ({"type": "template", "name": "...", "arguments": { ... }}), and file ({"type": "file", "file_type": "...", ...}) content blocks. Note: These changes do not affect the inference APIs or the legacy data stored in ClickHouse.

[!WARNING]
Completed Deprecations

The TensorZero Helm chart will no longer support the legacy gateway ingress. The createLegacyIngress variable was removed. Moving forward, the only supported gateway ingress is tensorzero-gateway.

Bug Fixes

Fix an issue that prevented comments from being rendered in the workflow evaluation UI.

New Features

Add HTTP endpoint for querying datapoints by ID (POST /v1/datasets/get_datapoints).
Add HTTP endpoint for querying datapoints with filters (POST /v1/datasets/{dataset_name}/list_datapoints).
Add HTTP endpoint for creating datapoints from inferences (POST /v1/datasets/{dataset_id}/from_inferences).
Add HTTP endpoint for updating datapoints (PATCH /v1/{dataset_name}/datapoints).
Add HTTP endpoint for updating datapoint metadata (PATCH /v1/datasets/{dataset_name}/datapoints/metadata).
Add HTTP endpoint for deleting datapoints (DELETE /v1/datasets/{dataset_id}/datapoints).
Add HTTP endpoint for deleting datasets (DELETE /v1/datasets/{dataset_id}).
Enable users to create, update, and delete messages and content blocks in the dataset editor in the UI.
Emit OpenTelemetry spans for rate limiting queries.
Add support for deployment service accounts in the Helm chart (thanks @jinnovation!).
Add support for dynamic extra attributes for OTLP spans (TensorZero-OTLP-Traces-Extra-Attribute-*).

& multiple under-the-hood and UI improvements

View release on GitHub

2025.10.7 Breaking risk 9mo

⚠ Upgrade required

Deprecation warning: Untagged enums for file content blocks will be removed after 2026.2+; migrate to tagged enums with `file_type` (`url`, `base64`, or `object_storage`).
Deprecation warning: Type `InferenceFilterTreeNode` in TensorZero Python SDK will be renamed to `InferenceFilter`; both aliases available until 2026.2+.

Breaking changes

Default value for `fetch_and_encode_input_files_before_inference` changed from true to false, altering when input files are fetched relative to inference.

Notable features

Batch datapoint updates via PATCH /v1/{dataset_name}/datapoints
Thought summaries exposed in TensorZero Python SDK
Additional semantic tags added for OpenInference trace exports

Full changelog

[!CAUTION]
Breaking Changes

The default value for fetch_and_encode_input_files_before_inference is changing from true to false. As a result, the gateway will no longer fetch input files before inference, but instead will fetch them in parallel with inference (for observability). In rare cases, this may cause the gateway to receive different input files than those received by model providers.

[!WARNING]
Planned Deprecations

Migrate file content blocks from untagged enums to tagged enums. Moving forward, you should provide a field file_type with a value of "url", "base64", or "object_storage". Untagged enums are still accepted for backwards compatibility but will be deprecated in 2026.2+.

Rename the TensorZero Python SDK type InferenceFilterTreeNode to InferenceFilter for consistency with related types. Both types will be available as aliases until 2026.2+.

Bug Fixes

Send a user agent when fetching input files to avoid restrictions from websites that require it (e.g. Wikimedia).

New Features

Add a new endpoint for batch datapoint updates (PATCH /v1/{dataset_name}/datapoints).
Expose thought summaries in the TensorZero Python SDK.
Add additional semantic tags when exporting traces using the OpenInference format (thanks @jinnovation!)

& multiple under-the-hood and UI improvements

View release on GitHub

2025.10.6 Breaking risk 9mo

⚠ Upgrade required

Update configuration: change `type = "static"` to `type = "inference"` and `type = "dynamic"` to `type = "workflow"`. Both old values remain accepted until tensorzero 2026.2+.

Breaking changes

Renaming configuration field `type = "static"` to `type = "inference"` and `type = "dynamic"` to `type = "workflow"`. Both old names will be supported until version 2026.2+.

Notable features

Short-hand model names for OpenAI Responses API (e.g., openai::responses::gpt-5)
Dynamic provider tools supporting web search via OpenAI Responses API
Custom `api_base` support for Anthropic model provider

Full changelog

[!WARNING]
Planned Deprecations

We're renaming "static evaluations" to "inference evaluations" and "dynamic evaluations" to "workflow evaluations". The only action needed is to update type = "static" in the configuration to type = "inference". Both versions will be supported until 2026.2+.

Bug Fixes

Fix a bug that dropped tool IDs in output tool_call content blocks when updating datapoints.
Prefer magic bytes over the Content-Type HTTP response header to infer MIME types of input files.

New Features

Support short-hand model names for the OpenAI Responses API (e.g. openai::responses::gpt-5).
Support dynamic provider tools (e.g. web search with the OpenAI Responses API).
Support custom api_base for the Anthropic model provider.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.10.5 Feature 9mo

Notable features

FinishReason.STOP_SEQUENCE enum value added to TensorZero Python SDK

Changelog

Bug Fixes

Add FinishReason.STOP_SEQUENCE to the TensorZero Python SDK.

View release on GitHub

2025.10.4 Bug fix 9mo

⚠ Upgrade required

Deprecation: `bulk_insert_datapoints` endpoint will be renamed to `create_datapoints`; both available until 2026.2+.
Python SDK type renames: `*InferenceDataset` → `*InferenceDatapoint`, `*Node` → `*Filter`.
Legacy inference input formats are no longer accepted (were deprecated previously).

Notable features

Support OpenAI Responses API
Structured generation (strict JSON) on Groq model provider
File URLs as inputs for Anthropic model provider

Full changelog

[!WARNING]
Planned Deprecations

The bulk_insert_datapoints method (POST /datasets/{dataset_name}/datapoints/bulk) will be renamed to create_datapoints (POST /datasets/{dataset_name}/datapoints). Both methods will be available until 2026.2+. (thanks @BrianLi23!)

[!WARNING]
Completed Deprecations

Concluded many small ongoing deprecations:

Python SDK: renamed the types *InferenceDataset → *InferenceDatapoint and *Node → *Filter

Inference: stop accepting legacy input formats (e.g. inline arguments for templates). These legacy formats have issued deprecation warnings for the last several months.

Dynamic Evaluations: renamed the variable datapoint_id → task_id

Bug Fixes

Improve the rendering performance of the code editor in the UI.
Fixed the X_per_month rate limit to cover a calendar month rather than 30 days.
Use max_completion_tokens rather than max_tokens in the Azure OpenAI Service model provider.

New Features

Support the OpenAI Responses API.
Support structured generation (strict JSON mode) on the Groq model provider.
Support inputs with file URLs on the Anthropic model provider.
Support encrypted reasoning and thought summaries on the OpenAI model provider.
Support dynamic OTLP resources when exporting OpenTelemetry traces (tensorzero-otlp-traces-extra-resource-*).
Support fallbacks for dynamic credentials (e.g. api_key_location = { default = "dynamic::foo", fallback = "env::bar" }).
Improve the handling for stale datapoints in the UI.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.10.3 Bugfix 9mo

Fixed Playground UI failures for inferences using static tools with custom names.

Full changelog

Bug Fixes

Fix bug in the Playground UI that caused inferences containing static tools with custom names (tools.my_tool.name) to fail.

View release on GitHub

2025.10.2 Breaking risk 9mo

⚠ Upgrade required

Explicitly list dynamic tools in the allowed‑tools configuration before the upcoming release.
Update any scripts or configs using `datapoint_name` to use `task_name`.
If relying on the default `--config-file` flag, add it manually when building/pulling `tensorzero/gateway` images.

Breaking changes

Dynamic tools will no longer be automatically included in the allowed list; explicit allowance required.
Renamed configuration key `datapoint_name` to `task_name` for dynamic evaluations.
Removed default inclusion of `--config-file` flag from `tensorzero/gateway` Dockerfile.

Notable features

Custom granular rate limits for users
Dynamic and static OTLP header support (Python SDK and config)
Optional `max_distance` field for `experimental_dynamic_in_context_learning` variants

Full changelog

[!WARNING]
Planned Deprecations

Currently, the gateway automatically includes all dynamic tools in the list of allowed tools. In a near-future release, dynamic tools will no longer be included automatically. If you intend for your dynamic tools to be allowed, please allow them explicitly.

[!WARNING]
Completed Deprecations

Finish renaming datapoint_name → task_name for dynamic evaluations.

Stop including --config-file in the Dockerfile for tensorzero/gateway by default.

Use the TENSORZERO_CLICKHOUSE_URL environment variable instead of CLICKHOUSE_URL.

Remove deprecated features from the OpenAI-compatible inference API.

Bug Fixes

Handle json_mode correctly in experimental_best_of_n variants.

New Features

Allow users to define and enforce custom granular rate limits.
Update the UI to handle unlimited named templates and schemas.
Support dynamic OTLP headers in the Python SDK.
Support static OTLP headers in the configuration.
Add optional max_distance configuration field for experimental_dynamic_in_context_learning variants.
Improve fallback behavior for experimental_dynamic_in_context_learning variants.
Allow Google AI Studio Gemini to accept input files beyond images.
Add name to datapoints.
Add experimental_run_evaluation to the Python SDK.
Allow users to configure default credentials by provider type.
Support supervised fine-tuning for Together AI models in the UI.

& multiple under-the-hood and UI improvements (thanks @dangvu0502!)

View release on GitHub

2025.10.1 New feature 9mo

Notable features

Increased default body limit to 100 MB for patch_openai_client

Full changelog

New Features

Increase default body limit to 100MB for patch_openai_client.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.10.0 Breaking risk 9mo

⚠ Upgrade required

Deprecation warning: replace `timeouts.non_streaming.total_ms` with `timeout_ms` for embedding model timeouts; removal planned in 2026.1+.
Deprecation warning: use CLI flags `--run-clickhouse-migrations` and `--run-postgres-migrations` instead of `--run-migrations-only`; removal planned in 2026.1+.
Deprecation warning: Prometheus metrics `request_count` and `inference_count` will be removed; use `tensorzero_requests_total` and `tensorzero_inferences_total`.

Notable features

UI support for adding, editing, and deleting tags for datapoints
UI support for adding, editing, and deleting `system` entries for datapoints
Configuration flag `gateway.fetch_and_encode_input_files_before_inference` with default true

Full changelog

[!WARNING]
Planned Deprecations

Configure timeouts for embedding models and embedding model providers with timeout_ms instead of timeouts.non_streaming.total_ms. The latter will be removed in a future release (2026.1+).

Use the gateway CLI flags --run-clickhouse-migrations and --run-postgres-migrations instead of --run-migrations-only. --run-migrations-only requires credentials for both databases, even though Postgres is an optional dependency, so it will be removed in a future release (2026.1+).

Scrape the Prometheus metrics tensorzero_requests_total and tensorzero_inferences_total instead of request_count and inference_count. The gateway will double-emit the metrics for now; the deprecated metrics will be removed in a future release (2026.1+).

Bug Fixes

Fixed an issue that prevented static evaluations on datapoints with no reference output to be rendered in the UI.
Fixed a regression in the gateway's internal HTTP client that that triggered unnecessary warnings and deteriorated performance when handling many concurrent streaming inferences.
Fixed an issue that prevented base64-encoded embedding requests from being cached by TensorZero.

New Features

Allow users to add, edit, and delete tags for datapoints in the UI.
Allow users to add, edit, and delete system for datapoints in the UI.
Add the configuration setting gateway.fetch_and_encode_input_files_before_inference. If set to true (default), the gateway will fetch remote input files and send them as a base64-encoded payload in the prompt; this is recommended to ensure that TensorZero and the model providers see identical inputs. If set to false, TensorZero will forward the input file URLs and fetch them for observability in parallel with inference.
Improved gateway errors for database issues.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.9.6 Bug fix 10mo

Notable features

Multiple small improvements to the evaluations UI for streamlined workflows and simplified debugging

Full changelog

Bug Fixes

Implemented a workaround for an upstream bug in opentelemetry-otlp that caused our OTLP exporter to fail to send data to encrypted endpoints.

New Features

Added multiple small improvements to the evaluations UI to streamline common workflows and simplify debugging.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.9.5 New feature 10mo

Notable features

Model observability page showing throughput and latency analytics in the UI
Support for OpenInference format when exporting OpenTelemetry traces
Supervised fine‑tuning (SFT) with GCP Vertex AI Gemini added to the UI

Full changelog

New Features

Add model observability page to the UI with model throughput and latency analytics.
Add support for OpenInference format when exporting OpenTelemetry traces.
Expand support of UI features for the default function (e.g. "Try with model").
Add support for supervised fine-tuning (SFT) with GCP Vertex AI Gemini in the UI.
Improve the performance of episode table in the UI.
Add an example of using the programmatic workflow for dynamic in-context learning.

& multiple under-the-hood and UI improvements (thanks @AnnaVernerovaHID @dangvu0502 @jinnovation!)

View release on GitHub

2025.9.4 Breaking risk 10mo

⚠ Upgrade required

Planned deprecation: rename Python SDK types from `Dicl*` to `DICL*`; both versions work now but deprecated ones will be removed in 2025.12+.

Notable features

Support unlimited prompt templates per function
Add `append_to_existing_variants` to programmatic DICL interface
Skip writing inference cache entries on tool call validation failure

Full changelog

[!WARNING]
Planned Deprecations

Rename types from Dicl* to DICL* in the Python SDK for consistency. Both versions work for now, and the deprecated types will be removed in a future release (2025.12+).

Bug Fixes

Fix a regression in the UI that prevented chat datapoints from being edited.

New Features

Expand the prompt templates and schemas functionality to support unlimited templates per function.
Support appending to existing DICL variants in the programmatic interface (append_to_existing_variants).
Skip writing inference cache entries if tool call validation fails.

& multiple under-the-hood and UI improvements (thanks @BretHudson!)

View release on GitHub

2025.9.3 New feature 10mo

Notable features

Dynamic OTLP header support for OpenTelemetry trace export
`allowed_tools` field added to OpenAI-compatible inference endpoint
Automatic HTTP/2 connection adjustment based on concurrency

Full changelog

New Features

Add support for dynamic OTLP headers when exporting OpenTelemetry traces.
Add support for allowed_tools field in the OpenAI-compatible inference endpoint.
Improve performance by automatically adjusting the number of HTTP2 connections to model providers based on concurrency.

& multiple under-the-hood and UI improvements (thanks @yuria-loo!)

View release on GitHub

2025.9.1 Bug fix 10mo

Notable features

Programmatic API for reinforcement fine-tuning (RFT) with OpenAI
Defaults added for individual fields in the `retries` configuration
Dynamic specification of Azure provider endpoint

Full changelog

Bug Fixes

Fix a regression that prevented rendering of inferences with thought content blocks in the UI.
Stop logging HTTP requests and responses twice in debug mode.

New Features

Add a programmatic API for reinforcement fine-tuning (RFT) with OpenAI.
Provide defaults for individual fields in the retries configuration.
Allow users to specify the Azure provider endpoint dynamically. (thanks @Dineshm-coder!)
Improve error messages when the gateway is missing credentials.

& multiple under-the-hood and UI improvements (thanks @JoshuaTanaka @HJStaiff!)

View release on GitHub

2025.9.0 Breaking risk 10mo

Breaking changes

The `feedback_id` field in the TensorZero Python SDK is no longer incorrectly doubly nested, aligning with type annotations.

Notable features

Throughput chart added to function detail page in TensorZero UI
Export OpenTelemetry spans for feedback endpoint
Recipes for supervised fine-tuning with `torchtune` and `axolotl`

Full changelog

[!CAUTION]

Breaking Changes

The bug fix for feedback_id technically introduces a breaking change in the TensorZero Python SDK. The field is no longer incorrectly doubly nested and now matches the SDK's type annotations.

[!WARNING]
Completed Deprecations

json_mode is now required for JSON function variants.

Bug Fixes

Added workarounds for two ClickHouse regressions (ClickHouse/ClickHouse#86415, ClickHouse/ClickHouse#86557) introduced in ClickHouse 25.8. Replicated self-hosted clusters are still affected by ClickHouse/ClickHouse#86434. Pin to 25.7 or earlier if you run a replicated cluster. Single-node self-hosted deployments and ClickHouse Cloud are not affected.
Fixed a bug in the TensorZero Python SDK that caused feedback_id to be doubly nested in feedback responses.
Fixed a logging issue where models were incorrectly reported as "not found" in the embedding endpoint even on success.
Fixed a bug where pending insertions could be dropped during shutdown when gateway.observability.batch_writes.enabled = true.
Fixed a bug in the dynamic in-context learning (DICL) recipe and programmatic API. The gateway automatically detects problematic examples and logs a warning with resolution instructions if necessary.

New Features

Added a throughput chart to the function detail page in the TensorZero UI.
Support exporting OpenTelemetry spans for the feedback endpoint.
Added recipes for supervised fine-tuning with torchtune and axolotl.
Added examples for using the embedding endpoint with Azure OpenAI Service and OpenAI-compatible providers like Ollama (thanks @slbotbm!).
Updated the DICL recipe to use TensorZero's new embedding API.
Added support for caching embeddings (thanks @ishbir!).

& multiple under-the-hood and UI improvements (thanks @contrun @jinnovation!)

View release on GitHub

2025.8.5 Bug fix 11mo

Notable features

Programmatic optimization interface for dynamic in-context learning
Exposure of more hyperparameters for programmatic supervised fine-tuning with Together AI

Full changelog

Bug Fixes

Reduce the ClickHouse memory footprint in large deployments with human feedback for evaluations.

New Features

Add a programmatic optimization interface for dynamic in-context learning.
Expose more hyperparameters for programmatic supervised fine-tuning with Together AI.

& many under-the-hood and UI improvements (thanks @quangIO!)

View release on GitHub

2025.8.4 Breaking risk 11mo

Breaking changes

Removal of support for unprefixed model names in the OpenAI‑compatible embeddings endpoint; future releases (2025.12+) will require prefix `tensorzero::embedding_model_name::`.

Notable features

Added `extra_body` field to embedding model configurations for custom API request fields.
Updated Azure OpenAI Service provider to use API version `2025-04-01-preview`.
Added CrewAI integration example.

Full changelog

[!WARNING]
Planned Deprecations

The OpenAI-compatible embeddings endpoint will require the prefix tensorzero::embedding_model_name:: for model names (e.g. tensorzero::embedding_model_name::openai::text-embedding-3-small). Support for unprefixed names will be removed in a future release (2025.12+).

Bug Fixes

Fix a ClickHouse warning that occurred when a model inference had input tokens set to null and output tokens non-null, or vice versa. This issue only caused warnings and did not affect TensorZero's user-facing functionality.

New Features

Add extra_body support for embedding model configurations to enable custom API request fields for various embedding providers. (thanks @ishbir!)
Update the Azure OpenAI Service model provider to use API version 2025-04-01-preview.
Add CrewAI integration example.

& multiple under-the-hood and UI improvements (thanks @MengAiDev!)

View release on GitHub

2025.8.3 Breaking risk 11mo

⚠ Upgrade required

If you previously enabled batching writes to ClickHouse via the embedded Python gateway, disable that setting or switch to a standalone (HTTP) gateway to avoid deadlocks caused by GIL interactions.

Breaking changes

Removed support for batching writes to ClickHouse when using the embedded Python gateway; batching remains available with a standalone (HTTP) gateway.

Notable features

Configuration can be split into multiple files using glob patterns
Example added for multimodal (vision) fine-tuning
More hyperparameters exposed for programmatic supervised fine‑tuning with Fireworks

Full changelog

[!CAUTION]
Breaking Changes

Temporarily removing support for batching writes to ClickHouse with the embedded gateway in Python: In the previous release, we added support for batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale (default off). Later, we discovered that in rare scenarios, the Python GIL could interfere with this setting in embedded clients and cause a deadlock. While we investigate a solution, we are removing support for batching with the embedded client to prevent technical footguns. Batching remains available when using a standalone (HTTP) gateway.

New Features

Add support for splitting configuration into multiple files with glob patterns
Add an example for multimodal (vision) fine-tuning
Expose more hyperparameters for programmatic supervised fine-tuning with Fireworks
Optimize queries in the UI to improve the performance of assorted pages in large-scale deployments
Enable setting global labels for all created resources in Helm (thanks @jinnovation!)
Support embedding endpoint when using the OpenAI SDK with an embedded gateway (patch_openai_client)

& many under-the-hood and UI improvements (thanks @wliu4040!)

View release on GitHub

2025.8.2 New feature 11mo

Notable features

Playground UI for side‑by‑side variant comparison, prompt iteration, and inference replay
ClickHouse write batching to increase ingest throughput and lower insert overhead at scale
Jupyter notebook recipe for supervised fine‑tuning with Unsloth

Full changelog

New Features

Add a Playground to the UI to compare variants side-by-side, iterate on prompts quickly, and replay inference requests.
Support batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale.
Add a Jupyter notebook recipe for supervised fine-tuning with Unsloth.

& many under-the-hood and UI improvements (thanks @contrun @lblack00!)

View release on GitHub

2025.8.1 New feature 11mo

Notable features

OpenAI‑compatible endpoint for embeddings supporting OpenAI and Azure OpenAI Service providers
Self‑hosted replicated ClickHouse database support
Parse `reasoning_content` from Fireworks and vLLM model providers

Full changelog

New Features

Add an OpenAI-compatible endpoint for embeddings, with support for OpenAI (& OpenAI-compatible) and Azure OpenAI Service model providers.
Add support for self-hosted replicated ClickHouse databases.
Parse reasoning_content from Fireworks and vLLM model providers.
Improve error messages for AWS Bedrock and AWS SageMaker model providers.

Bug Fixes

Allow configuration to specify description for JSON functions.
Fix a regression where function descriptions were no longer rendered in the UI.

& many under-the-hood and UI improvements (thanks @yuvraj-kumar-dev)

View release on GitHub

2025.8.0 New feature 11mo

Notable features

gateway.observability.skip_completed_migrations config to skip ClickHouse migration workflow on startup
Support for raw_text content blocks in OpenAI-compatible inference endpoint
Ability to collect outputs from "Try with variant" UI as demonstrations

Full changelog

New Features

Add gateway.observability.skip_completed_migrations configuration option to reduce gateway startup time and database load. When enabled, the gateway will skip running the ClickHouse migration workflow (i.e. verifying and potentially applying every migration) on startup for migrations that are already present in a database table that tracks migration history.
Support raw_text content blocks in the OpenAI-compatible inference endpoint. (Thanks @hongantran3804 @pykm05 @pycoder49!)
Allow users to collect outputs from "Try with variant" in the UI as demonstrations.

Bug Fixes

Fix handling of reasoning content blocks for DeepSeek-R1 on AWS Bedrock.
Set proper default value for max_tokens for the Anthropic and GCP Vertex AI Anthropic model providers. The gateway will now error if no value is provided in the configuration or request and the model is unknown.
Skip caching model inferences that generated invalid tool call arguments.

& many under-the-hood and UI improvements (thanks @michaldorsett @K-coder05 @dcaputo-harmoni @masonblier @Nicolasgarbarino!)

View release on GitHub

2025.7.5 New feature 0y

Notable features

Added `gateway.unstable_disable_feedback_target_validation` flag for large-scale deployments

Full changelog

Experimental

Add gateway.unstable_disable_feedback_target_validation configuration option to improve the performance of the feedback endpoint in large-scale deployments (not recommended unless you know what you're doing).

& multiple under-the-hood and UI improvements (thanks @michaldorsett @HJStaiff @liamjdavis!)

View release on GitHub

2025.7.4 Bug fix 1y

Notable features

Soft deletion of datasets via UI
Filtering by time and tags in experimental_list_inferences
Ordering by metric value and time in experimental_list_inferences

Full changelog

Bug Fixes

Fixed an issue with inference caching where inference requests that were identical except for their inline (base64-encoded) file data incorrectly shared the same cache key, resulting in false cache hits. The cache key now includes a hash of the inline file data, ensuring that such requests are properly distinguished.

New Features

Added functionality for deleting datasets in the UI (soft deletion).

Experimental

Added support for filtering by time and tags to the experimental_list_inferences method.
Added support for ordering by metric value and time to the experimental_list_inferences method.

& multiple under-the-hood and UI improvements (thanks @NamNgHH!)

View release on GitHub

2025.7.3 Bug fix 1y

⚠ Upgrade required

Migrate `gateway.enable_template_filesystem_access = true` to `gateway.template_filesystem_access.enabled = true`

Full changelog

[!WARNING]
Planned Deprecations

Migrate gateway.enable_template_filesystem_access = true to gateway.template_filesystem_access.enabled = true. We're about to add more fields to enable_template_filesystem_access to support multi-file configuration.

Bug Fixes

Remove a third-party dependency that was causing a memory leak in the UI.
Fix a regression that prevented the UI from running offline.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.7.2 Bugfix 1y

Fixed occasional connection errors with ClickHouse Cloud by updating the client implementation.

Full changelog

Bug Fixes

Update TensorZero's ClickHouse client to match the parameter recommendations by ClickHouse. (This change aims to resolve occasional connection errors with ClickHouse Cloud.)

View release on GitHub

2025.7.1 New feature 1y

⚠ Upgrade required

Experimental flag `gateway.unstable_error_json` now returns internal error details in response body.

Notable features

Improve UI components for rendering text, JSON, Markdown, and MiniJinja templates (syntax highlighting, line numbers, wrapping, etc.)
Improve performance of the UI's episode list page
Launch SFT jobs for Together AI and GCP Vertex AI Gemini programmatically

Full changelog

New Features

Improve UI components for rendering text, JSON, Markdown, and MiniJinja templates (syntax highlighting, line numbers, wrapping, etc.)
Improve the performance of the UI's episode list page
Add pseudonymous usage analytics to the gateway (see docs for details and instructions to opt out)

Experimental

Launch SFT jobs for Together AI and GCP Vertex AI Gemini programatically
Return internal error details in the response body (gateway.unstable_error_json) (thanks @panesher)

& many under-the-hood and UI improvements (thanks @michaldorsett @itsrajatrai @caarlos0)

View release on GitHub

2025.7.0 Breaking risk 1y

Notable features

Supervised fine‑tuning workflow now fully supports multimodal data (vision, documents) with multi‑turn tool use and TensorZero inference capabilities
Streaming inference support added for best‑of‑n and mixture‑of‑n variant types
Experimental Python client methods: `experimental_launch_optimization`, `experimental_poll_optimization`, `experimental_get_config` and extended `experimental_render_inferences`

Full changelog

New Features

Revamped the UI's supervised fine-tuning workflow to fully support TensorZero's inference capabilities, including multimodal data (vision, documents, etc.), multi-turn tool use, and more.
Added streaming inference support for best-of-n and mixture-of-n variant types.
Optimized the performance of some database queries in the UI.

Experimental

Experimental features don't have a stable API. They may change or be removed in future releases.

Added methods to the Python client for programmatically launching (experimental_launch_optimization) and polling for (experimental_poll_optimization) optimization jobs. For now, these methods support supervised fine-tuning with OpenAI and Fireworks AI.
Added a method to the Python client for retrieving the configuration (experimental_get_config).
Updated experimental_render_inferences to accept outputs from both experimental_list_inferences and list_datapoints.

& many under-the-hood and UI improvements (thanks @jeevikasirwani!)

View release on GitHub

2025.6.3 New feature 1y

Notable features

Added `delete = true` option to `extra_body` and `extra_headers` to remove built-in fields
Introduced `gateway.base_path` configuration field to prefix all endpoints
Added `discard_unknown_chunks` in model provider config to ignore unsupported chunk types

Full changelog

New Features

Add delete = true option to extra_body and extra_headers configuration fields to instruct the gateway to delete built-in fields from the request body or headers.
Add gateway.base_path field to configuration to instruct the gateway to prefix all endpoints with this path.
Add discard_unknown_chunks field to model provider configuration to instruct the gateway to discard chunks with unknown or unsupported types instead of throwing an error.
Add optional name field to tool configuration; if provided, the tool name will be sent to the LLMs instead of the tool ID, allowing for multiple tools with the same name.
Add functionality to filter list_datapoints by function name.

& multiple under-the-hood and UI improvements

View release on GitHub

2025.6.2 New feature 1y

Notable features

Granular timeouts via `[timeouts]` in variant and model configuration blocks
Shorthand model names for Groq (`groq::...`) and OpenRouter (`openrouter::...`) providers
Explicit `stop_sequences` inference parameter

Full changelog

New Features

Add recipe for supervised fine-tuning with Google Vertex AI Gemini
Add granular timeouts ([timeouts]) to variant and model configuration blocks
Support short-hand model names for Groq (groq::...) and OpenRouter (openrouter::...) model providers
Support tool use with vLLM (thanks @CHRV @chaet1t!)
Add explicit stop_sequences inference parameter
Support dynamic credentials in OpenAI-compatible inference endpoint (tensorzero::credentials) (thanks @zmij!)
Support multimodal inference and file inputs on AWS Bedrock

& multiple under-the-hood and UI improvements

View release on GitHub

2025.6.1 Breaking risk 1y

⚠ Upgrade required

Return null instead of an empty string when `service_tier` is missing in the OpenAI‑compatible inference endpoint.

Breaking changes

During streaming inference, `raw_name` in a tool call chunk is now an empty string after the tool name has finished streaming, differing from previous behavior where it repeated the same value.

Notable features

Allow inference containing files with arbitrary MIME types
[timeouts] section added to model provider configuration for granular timeout settings
Support templates without schemas; built‑in variables `system_text`, `assistant_text`, and `user_text` are now available

Full changelog

[!CAUTION]
Breaking Changes

Streaming Inference + Tool Use: During streaming inferences, raw_name in a tool call chunk represents a delta that should be accumulated. If the tool name has finished streaming, this field will contain an empty string. Previously, TensorZero returned the same raw_name in every subsequent chunk for that tool call. The new behavior matches the OpenAI API's behavior.

Bug Fixes

Return null instead of an empty string when missing service_tier in the OpenAI-compatible inference endpoint

New Features

Allow inference containing files with arbitrary MIME types
Add [timeouts] to model provider configuration for granular timeout functionality
Support templates without schemas; add built-in system_text, assistant_text, and user_text template variables
Support tags in OpenAI-compatible inference endpoint (tensorzero::tags)
Add experimental_list_inferences method to the client for retrieving historical inferences

& multiple under-the-hood and UI improvements (thanks @vr-varad!)

View release on GitHub

2025.6.0 Bug fix 1y

Notable features

Handle thinking and unknown content blocks for GCP Vertex Anthropic and Gemini models
Added `endpoint_id` field in configuration for fine‑tuned GCP Vertex Anthropic and Gemini models
Introduced Groq (`groq`) model provider

Full changelog

Bug Fixes

Increase database health check timeout in the gateway to 180s to gracefully handle warmup of serverless databases

New Features

Handle thinking and unknown content blocks for gcp_vertex_anthropic and gcp_vertex_gemini models
Add endpoint_id field in the configuration for gcp_vertex_anthropic and gcp_vertex_gemini models to support fine-tuned models
Add a dedicated Groq (groq) model provider (thanks @oliverbarnes!)
Support include_original_response during streaming inference

& multiple under-the-hood and UI improvements

View release on GitHub

All releases

🆕 TensorZero Autopilot