Skip to content

Release history

tensorzero releases

All releases

64 shown

No immediate action
2026.5.2 New feature

Stop param flexibility + OpenInference attributes

No immediate action
2026.5.1 Bugfix

SSE decoding errors

2026.5.0 Breaking risk
Breaking changes
  • UI requires authentication when the gateway requires authentication (previously only for gateway usage).
Notable features
  • Improved error handling and logging for complex streaming inferences, including status code propagation and fallbacks.
Full changelog

[!CAUTION]
Breaking Changes

  • The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.

New Features

  • Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).

& multiple under-the-hood and UI improvements (thanks @arisp)

2026.4.1 Breaking risk
⚠ Upgrade required
  • Deprecation: TensorZero Autopilot "Sessions" page removed from UI; future platform‑agnostic workflows planned.
Breaking changes
  • Gateway defaults to async observability writes; previous synchronous behavior requires `observability.async_writes = false`.
Notable features
  • TypeScript evaluators for inference evaluations
  • Support for vLLM's new `reasoning` field
  • Aggregated variant usage data (tokens, cost) in UI
Full changelog

[!CAUTION]
Breaking Changes

  • The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set observability.async_writes = false. [docs]

[!WARNING]
Deprecations

  • Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.

Bug Fixes

  • Return HTTP code 429 for rate limiting errors.
  • Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)

New Features

  • Added TypeScript evaluators (for inference evaluations).
  • Added support for vLLM's new reasoning field.
  • Added aggregated variant usage data (tokens, cost, etc.) to the UI.
  • Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
  • Added export.otlp.traces.include_content (default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.

& multiple under-the-hood and UI improvements

2026.4.0 New feature
Notable features
  • Add MCP server to gateway exposing API at /mcp
  • Report provider prompt caching statistics via API and UI
  • Report usage statistics (tokens, latency, cost) for inference evaluations via CLI, API, and UI
Full changelog

New Features

  • Add an MCP server to the gateway exposing its API in /mcp.
  • Report provider prompt caching statistics via API and UI.
  • Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI.
  • Add the Prometheus metrics tensorzero_input_tokens_total and tensorzero_output_tokens_total.
  • Add configuration field content_type_overrides to handle file inputs for long-tail providers.

& multiple under-the-hood and UI improvements

2026.3.4 Breaking risk
⚠ Upgrade required
  • Deprecation: Inference evaluation config must be nested under function names; legacy flat format will be removed in a future release.
  • Deprecation: `launch_optimization` with `GEPAConfig` is deprecated and will be removed; use `t0.optimization.gepa.launch` instead.
Notable features
  • TensorZero Autopilot: automated AI engineer that analyzes LLM data, configures evaluations, optimizes prompts/models, and runs A/B tests
  • Embeddings requests now counted in Prometheus metrics `tensorzero_requests_total` and `tensorzero_inferences_total`
  • Observability configuration field `observability.batch_writes.write_queue_capacity` added for gateway backpressure
Full changelog

[!WARNING]
Planned Deprecations

  • The configuration for inference evaluations should be nested under the relevant functions moving forward [docs]. You can run evaluations by providing a function name and a list of evaluators. The legacy format will be removed in a future release.
    [functions.write_haiku.evaluators.exact_match]
    type = "exact_match"
    
  • The legacy implementation of GEPA (launch_optimization with GEPAConfig) will be removed in a future release. Please use t0.optimization.gepa.launch instead. [docs]

Bug Fixes

  • Fixed a UI bug where a custom gateway base_path was not handled correctly in certain routes. (thanks @wangfenjin!)

New Features

  • Started including embeddings requests in the Prometheus metrics tensorzero_requests_total and tensorzero_inferences_total.
  • Added the configuration field observability.batch_writes.write_queue_capacity to enable backpressure for observability data in the gateway.

& multiple under-the-hood and UI improvements (thanks @majiayu000)!


[!IMPORTANT]

🆕 TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests.

It dramatically improves the performance of LLM agents across diverse tasks:


Learn more →  Schedule a demo →

2026.3.3 Breaking risk
Breaking changes
  • Removed assistant message prefill for JSON functions with Anthropic (deprecated by Anthropic).
Notable features
  • GEPA automated prompt engineering via durable workflows
  • Support duplicate tool calls in `all_of` evaluators for parallel execution
  • UI option to set expiration date for API keys
Full changelog

Bug Fixes

  • Fixed two edge cases affecting batch inference.
  • Fixed a UI bug affecting "Try with..." with inputs that include base64 files.
  • Removed assistant message prefill for JSON functions + Anthropic (deprecated by Anthropic).

New Features

  • Added an implementation of GEPA (automated prompt engineering) based on durable workflows.
  • Allow users to specify duplicate tool calls in all_of tool evaluators to evaluate parallel tool calling.
  • Allow users to specify an expiration date for API keys in the UI. (thanks @eibrahim95)
  • Allow users to specify object_storage.endpoint = "env::MY_ENV_VAR" in addition to static values. (thanks @Meredith2328)

& multiple under-the-hood and UI improvements (thanks @majiayu000)!

2026.3.2 Bug fix
Notable features
  • Postgres added as an alternative observability backend to ClickHouse (recommended for low RPS)
  • `openrouter::xxx` shorthand for embedding models
  • Per-session API keys in the browser when auth is enabled
Full changelog

Bug Fixes

  • Fixed an UI issue that prevented certain pages from rendering when depending on historical configuration.

New Features

  • Added Postgres as an alternative observability backend to ClickHouse. Postgres is the simplest way to get started; we recommend ClickHouse if you're handling >100 RPS.
  • Added the openrouter::xxx short-hand for embedding models.
  • Added support for per-session API keys in the browser (instead of a global environment variable) when auth is enabled.

& multiple under-the-hood and UI improvements!

2026.3.1 Breaking risk
⚠ Upgrade required
  • The embedded gateway in the TensorZero Python SDK will be removed in version 2026.6+; migrate to a standalone TensorZero Gateway using `base_url` for OpenAI SDK or `build_http` for TensorZero SDK.
  • The variant configuration field `weight` will be removed in version 2026.6+; transition to the new experimentation configuration semantics documented at https://www.tensorzero.com/docs/experimentation/run-static-ab-tests.
Breaking changes
  • Removed `model_provider_name` filter for `extra_body` and `extra_headers`; use `model_name` and `provider_name` instead.
  • Removed legacy experimental `list_inferences` endpoint; use the new endpoint documented at https://www.tensorzero.com/docs/observability/query-historical-inferences.
  • Removed several long-deprecated types and methods from the TensorZero Python SDK.
Notable features
  • Added support for launching optimization workflows with `dataset_name` in `launch_optimization_workflow`.
Full changelog

[!WARNING]
Completed Deprecations

  • Removed the deprecated model_provider_name filter for extra_body and extra_headers. Please use model_name and provider_name instead.
  • Removed the legacy experimental list_inferences endpoint and method. Please use the new endpoint instead. [docs]
  • Removed several long-deprecated types and methods from the TensorZero Python SDK.

[!WARNING]
Planned Deprecations

  • The embedded gateway in the TensorZero Python SDK will be removed in a future release (2026.6+). patch_openai_client and build_embedded are deprecated. Please deploy a standalone TensorZero Gateway instead (usage: base_url for OpenAI SDK; build_http for TensorZero SDK).
  • The variant configuration field weight will be removed in a future release (2026.6+). Please use the new experimentation configuration semantics. [docs]

Bug Fixes

  • Fixed a compatibility bug with Valkey-based caching that only affected Redis.

New Features

  • Added support for launching optimization workflows with dataset_name (instead of an inference query) in launch_optimization_workflow.

& multiple under-the-hood and UI improvements!

2026.3.0 Breaking risk
⚠ Upgrade required
  • Configuration fields static_weights, track_and_stop will be removed in a future release; see Run adaptive A/B tests and Run static A/B tests docs for updated usage.
  • Evaluator configuration field cutoff will be removed; use CLI flag --cutoffs evaluator=value,... instead.
  • Gateway route /variant_sampling_probabilities will be removed in a future release.
Breaking changes
  • Removed deprecated Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram; use tensorzero_inference_latency_overhead_seconds instead.
Notable features
  • Added regex and tool_use evaluators.
  • Added experimental_launch_optimization_workflow to the TensorZero Python SDK.
Full changelog

[!WARNING]
Completed Deprecations

  • The deprecated Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram was removed. Use tensorzero_inference_latency_overhead_seconds instead.

[!WARNING]
Planned Deprecations

  • The configuration for experimentation (e.g. static_weights, track_and_stop) was simplified. The old notation will be removed in a future release. See Run adaptive A/B tests and Run static A/B tests for more information.
  • The evaluator configuration field cutoff will be removed in a future release. Instead, provide --cutoffs evaluator=value,... in the CLI.
  • The gateway route /variant_sampling_probabilities will be removed in a future release.
  • The configuration field postgres.enabled will be removed in a future release. Instead, the gateway will consider whether the environment variable TENSORZERO_POSTGRES_URL is set.

New Features

  • Add regex and tool_use evaluators. [docs]
  • Add experimental_launch_optimization_workflow to the TensorZero Python SDK.

& multiple under-the-hood and UI improvements!

2026.2.2 Breaking risk
Breaking changes
  • Removed deprecated legacy dataset management endpoints; use new endpoints for that functionality.
  • Changed `--config-file` globbing: single‑level wildcards (`*`) no longer match files across directory boundaries; require recursive wildcard (`**`).
Notable features
  • Cost tracking and cost‑based rate limiting
  • Namespaces for multiple granular A/B experiments on the same TensorZero function
  • Improved reasoning support for Anthropic, Fireworks AI, SGLang, Together AI
Full changelog

[!CAUTION]
Breaking Changes

  • The --config-file globbing behavior has changed: single-level wildcards (*) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (**). This aligns the behavior with standard glob semantics. For example:
    • --config-file *.toml matches tensorzero.toml, but not subdir/tensorzero.toml.
    • --config-file **/*.toml matches both tensorzero.toml and subdir/tensorzero.toml.

[!WARNING]
Completed Deprecations

  • Removed deprecated legacy endpoints for dataset management. The functionality is fully covered by the new endpoints.

New Features

  • Add cost tracking and cost-based rate limiting.
  • Add namespaces: the ability to set up multiple granular experiments (A/B tests) for the same TensorZero function.
  • Improve reasoning support for Anthropic (including adaptive thinking), Fireworks AI, SGLang, and Together AI.
  • Allow users to whitelist automatic tool approvals for TensorZero Autopilot.
  • Report provider errors when include_raw_response is enabled.
  • Add include_aggregated_response to streaming inferences. When enabled, the final chunk includes an aggregated output aggregated_response that combines previous chunks.
  • Allow users to kill ongoing evaluation runs from UI.
  • Allow custom gateway bind addresses with the environment variable TENSORZERO_GATEWAY_BIND_ADDRESS.

& multiple under-the-hood and UI improvements (thanks @Nfemz @greg80303)!

2026.2.1 Breaking risk
Breaking changes
  • Default value for `cache_options.enabled` changed from `write_only` to `off`.
Notable features
  • Support reasoning models from Groq, Mistral, and vLLM.
  • Support multi-turn reasoning with Gemini and OpenAI‑compatible models.
  • Support embedding models from Together AI.
Full changelog

[!CAUTION]
Breaking Changes

  • The default value for cache_options.enabled changed from write_only to off.

New Features

  • Support reasoning models from Groq, Mistral, and vLLM.
  • Support multi-turn reasoning with Gemini and OpenAI-compatible models.
  • Support embedding models from Together AI.
  • Add configurable total_ms timeout to streaming inferences.
  • Display charts with top-k evaluation results in the TensorZero Autopilot UI.
  • Add "Ask Autopilot" buttons throughout the UI.
  • Allow TensorZero Autopilot to edit your local configuration files.
  • Return thought and unknown content blocks in the OpenAI-compatible endpoint (tensorzero_extra_content).

& multiple under-the-hood and UI improvements!

2026.2.0 Breaking risk
⚠ Upgrade required
  • `beta_structured_outputs` configuration field is deprecated and ignored; will be removed in a future release.
Notable features
  • YOLO Mode for TensorZero Autopilot
  • Interruption feature for TensorZero Autopilot sessions
  • Summary row added to TensorZero Autopilot session table
Full changelog

[!WARNING]
Planned Deprecations

  • Anthropic's structured output feature is out of beta, so the TensorZero configuration field beta_structured_outputs is now ignored and deprecated. It'll be removed in a future release.

Bug Fixes

  • Fix a regression in the aws_bedrock provider that affected long-term bearer API keys.
  • Fix a horizontal overflow issue for tool calls and results in the inference detail UI page.

New Features

  • Add YOLO Mode for TensorZero Autopilot.
  • Add interruption feature for TensorZero Autopilot sessions.
  • Add summary to the TensorZero Autopilot session table in the UI.

& multiple under-the-hood and UI improvements (thanks @pratikbuilds)!

2026.1.8 Bugfix

Fixed a race condition disabling chat input in TensorZero Autopilot UI.

Full changelog

Bug Fixes

  • Fix a race condition in the TensorZero Autopilot UI that could disable the chat input.
  • Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations).

& multiple under-the-hood and UI improvements!

2026.1.7 New feature
Notable features
  • TensorZero Autopilot (preview) – automated AI engineer for LLM observability, prompt/model optimization, eval setup, and A/B testing
  • Support multi-turn reasoning for xAI via `reasoning_content`
Full changelog

New Features

  • [Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more → Join the waitlist →
  • Support multi-turn reasoning for xAI (reasoning_content only).

& multiple under-the-hood and UI improvements!

2026.1.6 Breaking risk
⚠ Upgrade required
  • When using `unstable_error_json` with the OpenAI‑compatible inference endpoint, replace `error_json` with `tensorzero_error_json`. Both fields are currently emitted with identical data; future releases will remove `error_json`.
Breaking changes
  • OpenAI-compatible endpoints return errors in the standard OpenAI format (`{"error": {"message": "..."}}`) instead of the previous TensorZero format (`{"error": "..."}`).
Notable features
  • Native support for provider tools (e.g., web search) added to Anthropic and GCP Vertex AI Anthropic model providers.
  • Improved streaming handling of reasoning content blocks in OpenAI Responses API.
  • Graceful handling of missing `usage` fields during inference with the OpenAI model provider.
Full changelog

[!CAUTION]
Breaking Changes

  • Moving forward, TensorZero will use the OpenAI API's error format ({"error": {"message": "Bad!"}) instead of TensorZero's error format ({"error": "Bad!"}) in the OpenAI-compatible endpoints.

[!WARNING]
Planned Deprecations

  • When using unstable_error_json with the OpenAI-compatible inference endpoint, use tensorzero_error_json instead of error_json. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected.

New Features

  • Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use extra_body to handle these tools.
  • Improve handling of reasoning content blocks when streaming with the OpenAI Responses API.
  • Handle inferences with missing usage fields gracefully in the OpenAI model provider.
  • Improve error handling across the UI.

& multiple under-the-hood and UI improvements!

2026.1.5 Breaking risk
⚠ Upgrade required
  • Migrate `include_original_response` to `include_raw_response` in all SDK configurations.
  • Update AWS model provider settings: replace `allow_auto_detect_region = true` with `region = "sdk"`.
  • When configuring custom Anthropic providers, set `api_base` to the base URL without the trailing endpoint (e.g., remove `/messages`).
Breaking changes
  • Normalized `usage` reporting: `input_tokens` and `output_tokens` now include all provider token variations (caching, reasoning, etc.), while cached tokens remain excluded. Raw provider usage can be accessed via `include_raw_usage`.
  • Deprecation of `include_original_response`; migrate to `include_raw_response` for full model inference metadata.
  • Deprecation of `allow_auto_detect_region = true`; migrate to `region = "sdk"` when configuring AWS model providers.
Notable features
  • Improved error handling across TensorZero UI, JSON deserialization, AWS providers, streaming inferences, and timeouts.
  • Support for Valkey (Redis) to enhance rate‑limiting performance at ≥100 QPS.
  • Added `reasoning_effort` support for Gemini 3 models (mapped to `thinkingLevel`).
Full changelog

[!CAUTION]
Breaking Changes

  • TensorZero will normalize the reported usage from different model providers. Moving forward, input_tokens and output_tokens include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with include_raw_usage.

[!WARNING]
Planned Deprecations

  • Migrate include_original_response to include_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata.
  • Migrate allow_auto_detect_region = true to region = "sdk" when configuring AWS model providers. The behavior is identical.
  • Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:
    • Before: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages"
    • Now: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"

Bug Fixes

  • Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
  • Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.

New Features

  • Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
  • Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
  • Support reasoning_effort for Gemini 3 models (mapped to thinkingLevel).
  • Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward, json_mode = "strict" will use the beta structured outputs feature; json_mode = "on" still uses the legacy assistant message prefill.
  • Improve handling of reasoning content in the OpenRouter and xAI model providers.
  • Add extra_headers support for embedding models. (thanks @jonaylor89!)
  • Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.

& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!

2026.1.2 New feature
Notable features
  • Append to arrays using `/my_array/-` with `extra_body`.
  • Handle cross‑model thought signatures in GCP Vertex AI Gemini and Google AI Studio.
Full changelog

New Features

  • Support appending to arrays with extra_body using the /my_array/- notation.
  • Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.

& multiple under-the-hood and UI improvements (thanks @ecalifornica!)

2026.1.1 Bug fix
⚠ Upgrade required
  • Deprecation warning: In a future release, `model` will become required in DICLOptimizationConfig initialization (currently optional with default openai::gpt-5-mini).
Notable features
  • Support stream_options.include_usage for every model under the Azure provider
Full changelog

[!WARNING]
Planned Deprecations

  • In a future release, the parameter model will be required when initializing DICLOptimizationConfig. The parameter remains optional (defaults to openai::gpt-5-mini) in the meantime.

Bug Fixes

  • Stop buffering raw_usage when streaming with the OpenAI-compatible inference endpoint; instead, emit raw_usage as soon as possible, just like in the native endpoint.
  • Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.

New Features

  • Support stream_options.include_usage for every model under the Azure provider.

& multiple under-the-hood and UI improvements!

2026.1.0 Breaking risk
⚠ Upgrade required
  • Update monitoring dashboards and alerts to use the new histogram buckets for `tensorzero_inference_latency_overhead_seconds`.
  • Replace usage of deprecated environment variable `TENSORZERO_CLICKHOUSE_URL` with gateway‑mediated queries.
  • Adjust configuration: migrate from `tensorzero_inference_latency_overhead_seconds_histogram_buckets` to `tensorzero_inference_latency_overhead_seconds_buckets`.
Breaking changes
  • Metric `tensorzero_inference_latency_overhead_seconds` changed from a summary to a histogram (default buckets: 1ms, 10ms, 100ms).
  • Deprecation of environment variable `TENSORZERO_CLICKHOUSE_URL` in the UI.
  • Renamed Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` to `tensorzero_inference_latency_overhead_seconds` (both emitted temporarily).
Notable features
  • Optional `include_raw_usage` parameter in inference requests returns raw usage objects alongside normalized usage.
  • Optional `--bind-address` CLI flag added to the gateway.
  • Optional `description` field for metrics in configuration.
Full changelog

[!CAUTION]
Breaking Changes

  • The Prometheus metric tensorzero_inference_latency_overhead_seconds will report a histogram instead of a summary. You can customize the buckets using gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets in the configuration (default: 1ms, 10ms, 100ms).

[!WARNING]
Planned Deprecations

  • Deprecate the TENSORZERO_CLICKHOUSE_URL environment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse.
  • Rename the Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram to tensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now.
  • Rename the configuration field tensorzero_inference_latency_overhead_seconds_histogram_buckets to tensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.

New Features

  • Add optional include_raw_usage parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized usage response field.
  • Add optional --bind-address CLI flag to the gateway.
  • Add optional description field to metrics in the configuration.
  • Add option to fine-tune Fireworks models without automatic deployment.

& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!

2025.12.6 Breaking risk
Breaking changes
  • Removed `credential_location` from `DICLOptimizationConfig`.
  • Moved `account_id` to `[provider_types.fireworks.sft]` and removed `api_base` and `credential_location` from `FireworksSFTConfig`.
  • Moved `bucket_name`, `bucket_path_prefix`, `kms_key_name`, `project_id`, `region`, and `service_account` to `[provider_types.gcp_vertex_gemini.sft]` and removed them from `GCPVertexGeminiSFTConfig`.
Notable features
  • Gateway relay support for routing LLM inference requests through multiple TensorZero Gateway deployments.
  • Added "Try with model" button to datapoint page in the UI.
  • Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` for meta‑observability.
Full changelog

[!CAUTION]
Breaking Changes

  • Migrated the following optimization fields from the TensorZero Python SDK to the configuration:
    • DICLOptimizationConfig: removed credential_location.
    • FireworksSFTConfig: moved account_id to [provider_types.fireworks.sft]; removed api_base and credential_location.
    • GCPVertexGeminiSFTConfig: moved bucket_name, bucket_path_prefix, kms_key_name, project_id, region, and service_account to to [provider_types.gcp_vertex_gemini.sft].
    • OpenAIRFTConfig: removed api_base and credential_location.
    • OpenAISFTConfig: removed api_base and credential_location.
    • TogetherSFTConfig: hf_api_token, wandb_api_key, wandb_base_url, and wandb_project_name moved to [provider_types.together.sft]; removed api_base and credential_location.

New Features

  • Support gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features.
  • Add "Try with model" button to the datapoint page in the UI.
  • Add tensorzero_inference_latency_overhead_seconds_histogram Prometheus metric for meta-observability.
  • Add concurrency parameter to experimental_render_samples (defaults to 100).
  • Add otlp_traces_extra_attributes and otlp_traces_extra_resources to the TensorZero Python SDK. (thanks @jinnovation!)

& multiple under-the-hood and UI improvements (thanks @ecalifornica)

2025.12.5 Breaking risk
⚠ Upgrade required
  • The `experimental_chain_of_thought` variant type will be deprecated in version 2026.2+; migrate to native reasoning capabilities.
  • The `timeout_s` configuration field for best/mixture-of-N variants will be deprecated in version 2026.2+; use the `[timeouts]` block instead.
Notable features
  • UI dataset builder supports complex queries (filter by tags, feedback)
  • Export Prometheus metric tensorzero_inference_latency_overhead_seconds
  • CLI flag --disable-api-key to disable TensorZero API keys
Full changelog

[!WARNING]
Planned Deprecations

  • The variant type experimental_chain_of_thought will be deprecated in 2026.2+. As reasoning models are becoming prevalent, please use their native reasoning capabilities.
  • The timeout_s configuration field for best/mixture-of-N variants will be deprecated in 2026.2+. Please use the [timeouts] block in the configuration for their candidates instead.

New Features

  • Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback).
  • Export tensorzero_inference_latency_overhead_seconds Prometheus metric for meta-observability.
  • Allow users to disable TensorZero API keys using --disable-api-key in the CLI. (thanks @jinnovation!)

& multiple under-the-hood and UI improvements (thanks @ecalifornica)!

2025.12.3 Bug fix
Notable features
  • Performance improvement for inference and datapoint list pages in the UI
  • Support filtering inferences by presence of a demonstration
Full changelog

Bug Fixes

  • Fix a bug where negative tag filters (e.g. user_id != 1) matched inferences and datapoints without that tag.
  • Fix a bug where metric filters covering default values (e.g. exact_match = false) matched inferences without that metric.
  • Fix a regression affecting the logger in the UI.

New Features

  • Improve the performance of the inference and datapoint list pages in the UI.
  • Support filtering inferences by whether they have a demonstration.

& multiple under-the-hood and UI improvements (thanks @jinnovation @ecalifornica @simeonlee)!

2025.12.2 Bug fix
Notable features
  • Customizable log level via TENSORZERO_UI_LOG_LEVEL
Full changelog

Bug Fixes

  • Fix a performance regression affecting the inference table in the UI.

New Features

  • Allow users to customize the log level in the UI (TENSORZERO_UI_LOG_LEVEL).

& multiple under-the-hood and UI improvements

2025.12.1 Bugfix

Fixed regression that broke the dataset builder in the UI.

Full changelog

Bug Fixes

  • Fixed a regression that broke the dataset builder in the UI.

& multiple under-the-hood and UI improvements

2025.12.0 Breaking risk
⚠ Upgrade required
  • Environment variables `TENSORZERO_UI_CONFIG_PATH` and `TENSORZERO_UI_DEFAULT_CONFIG` are deprecated and ignored.
  • `model_provider_name` is still accepted in the API but will be removed in a future release; migrate to using `model_name` and `provider_name`.
Breaking changes
  • Unknown content blocks now return `model_name` and `provider_name` instead of fully-qualified `model_provider_name`.
Notable features
  • Free‑form search and filtering in inference and datapoint tables
  • Create, edit, clone datapoints directly from the UI
  • Peek at inferences on episode detail pages
Full changelog

[!CAUTION]
Breaking Changes

  • Unknown content blocks now return the scope as model_name and provider_name instead of the fully-qualified model_provider_name.

[!WARNING]
Planned Deprecations

  • The TensorZero UI now reads the configuration from the gateway (instead of reading directly from the filesystem). The environment variables TENSORZERO_UI_CONFIG_PATH and TENSORZERO_UI_DEFAULT_CONFIG are deprecated and ignored. You no longer need to mount the configuration onto the UI container.
  • Use model_name and provider_name to scope provider tools (e.g. OpenAI Responses API web search) instead of model_provider_name. The deprecated name is still accepted in the API.

Bug Fixes

  • Fix a regression in the "Try with..." modal in the UI that disregarded some parameters (e.g. allowed_tools).
  • Fix a regression in allowed_tools when using custom display names for tools.
  • Fix an edge case when using both allowed_tools and tool_choice parameters with GCP Vertex AI Gemini.

New Features

  • Support free-form search and filtering (e.g. by tags, metrics) the inference and datapoint tables in the UI.
  • Support creating datapoints from scratch in the UI.
  • Support editing TensorZero API key descriptions in the UI (thanks @nicoestrada!).
  • Support editing any kind of datapoint input and output in the UI.
  • Support peeking at inferences in the episode detail page in the UI (thanks @BrianLi23!).
  • Support cloning datapoints in the UI.
  • Optimize the rendering performance of the code editor in the UI.
  • Make mime_type optional for base64 file inputs (now inferred from magic bytes when not provided).

& multiple under-the-hood and UI improvements

2025.11.6 Bug fix
Notable features
  • Programmatic evaluations on specific datapoints via `datapoint_ids`
  • Generation of `values.schema.json` for the Helm chart
Full changelog

Bug Fixes

  • Handle a regression in ClickHouse latest that affected the endpoint for deleting datapoints.

New Features

  • Support running evaluations programmatically on specific datapoints (datapoint_ids).
  • Generate values.schema.json for the Helm chart. (thanks @Erin-Boehmer!)
2025.11.5 Breaking risk
⚠ Upgrade required
  • Rename `json_mode="implicit_tool"` to `json_mode="tool"`.
  • Use `model_name` (and optionally `provider_name`) instead of `model_provider_name` in `extra_body` and `extra_headers` objects supplied at inference time; scope filters are optional.
Breaking changes
  • Explicit `tensorzero::params` take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint.
Notable features
  • Native support for Anthropic's Beta Structured Outputs (`beta_structured_outputs`) without needing `extra_headers`
  • `json_mode="tool"` now supported in chat inferences even when no tools are included
  • Thought signatures added for GCP Vertex model providers
Full changelog

[!CAUTION]
Breaking Changes

  • Moving forward, explicit tensorzero::params will take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint.

[!WARNING]
Planned Deprecations

  • Rename json_mode="implicit_tool" to json_mode="tool".
  • Set model_name and optionally provider_name instead of model_provider_name in extra_body and extra_headers objects supplied at inference time. Alternatively, don't include a scope filter at all.

New Features

  • Support Anthropic's Beta Structured Outputs feature natively (beta_structured_outputs). extra_headers is no longer necessary.
  • Support json_mode="tool" in chat inferences that don't otherwise include tools.
  • Support extra_body and extra_headers supplied at inference time without scope filters.
  • Support extra_body and extra_headers supplied at inference time with model_name and optional provider_name scope filters.
  • Support thought signatures for the GCP Vertex model providers.
  • Support custom tools for the OpenAI model provider.
  • Add description fields to evaluation and evaluator configuration.

& multiple under-the-hood and UI improvements

2025.11.4 Breaking risk
⚠ Upgrade required
  • Replace `page_size` with `limit` in observability methods.
  • Place fields previously nested in `metadata` or `tool_params` at the root when calling PATCH /v1/datasets/{dataset_name}/datapoints or update_datapoints.
  • Deprecation warning: use `limit` instead of `page_size` for programmatic observability methods (will be removed in a future release).
Breaking changes
  • Require `allowed_tools` to include any dynamically specified tools; previously assumed always allowed.
Notable features
  • Adaptive stopping for evaluations in UI and Python SDK
  • Support explicit `candidate_variants` and `fallback_variants` with uniform sampling
  • Add `input_audio` content block support across multiple model providers
Full changelog

[!CAUTION]
Breaking Changes

  • Moving forward, allowed_tools must include dynamic tools (tools specified at inference time rather than in configuration). This matches the OpenAI API behavior. Previously, TensorZero assumed that dynamic tools were always allowed.

[!WARNING]
Planned Deprecations

  • Use limit instead of page_size with the programmatic observability methods. Previously, the methods mixed these two fields.
  • Don't nest fields in metadata or tool_params when calling PATCH /v1/datasets/{dataset_name}/datapoints or update_datapoints. Moving forward, please place them in the root.

[!WARNING]
Completed Deprecations

  • Require template_filesystem_access.base_path when template_filesystem_access.enabled is true.
  • Removed many deprecated experimental types and methods from the TensorZero Python SDK.

New Features

  • Add adaptive stopping for evaluations in the UI and Python SDK.
  • Support explicit candidate_variants and fallback_variants when using uniform sampling.
  • Support the input_audio content block in the OpenAI-compatible inference endpoint.
  • Support the input_audio content block in the OpenAI, Azure, GCP Vertex Gemini, Google AI Studio, and OpenRouter model providers.
  • Add optional filename field for input files.
  • Move closer to parity between the GCP Vertex Anthropic model provider and the Anthropic model provider.
  • Expose new observability and dataset management endpoints as methods in the TensorZero Python SDK.
  • Add optional postgres.enabled field to the configuration.
  • Handle missing usage information from model providers that don't report it.
  • Add experimental method for searching inferences programmatically (search_query_experimental).
  • Add a native OpenRouter embedding model provider.

& multiple under-the-hood and UI improvements

2025.11.3 Bugfix

Fixed handling of user‑defined tags in batch inference.

Full changelog

Bug Fixes

  • Enable TLS support for Postgres connections.
  • Fix handling of user-defined tags in batch inference.

& multiple under-the-hood and UI improvements

2025.11.2 Breaking risk
Breaking changes
  • Gateway attempts `fallback_variants` in order rather than randomly sampling them.
Notable features
  • Tag inference and feedback with `tensorzero::api_key_public_id` when using auth.
  • Add POST /v1/datasets/{dataset_name}/datapoints endpoint for creating datapoints.
  • Introduce `gateway.global_outbound_http_timeout_ms` configuration setting.
Full changelog

[!CAUTION]
Breaking Changes

  • Moving forward, the gateway will attempt any fallback_variants in order rather than randomly sample them.

Bug Fixes

  • Fix a bug that prevented some model inferences from being rendered correctly in the UI.
  • Handle non-image base64 file inputs consistently in the OpenAI-compatible inference endpoint.
  • Handle raw_response correctly for batch inference with GCP Vertex AI Gemini.

New Features

  • Apply the tensorzero::api_key_public_id tag to inference and feedback when using auth.
  • Add updated HTTP endpoint for creating datapoints (POST /v1/datasets/{dataset_name}/datapoints).
  • Add gateway.global_outbound_http_timeout_ms configuration setting.

& multiple under-the-hood and UI improvements (thanks @omarraf!)

2025.11.1 Bug fix
Notable features
  • Rate limiting by API key (`api_key_public_id`)
  • Native `service_tier` parameter for supported providers (Anthropic, Azure, Groq, OpenAI) – removes need for `extra_body`
  • Native `detail` parameter for input images in Azure, OpenAI, xAI – removes need for `extra_body`
Full changelog

Bug Fixes

  • Fix a regression that prevented batch inferences from being rendered in the UI.
  • Handle missing Postgres credentials gracefully in the UI.

New Features

  • Support rate limiting by API key (api_key_public_id).
  • Add native service_tier inference parameter (supported providers: Anthropic, Azure, Groq, OpenAI). extra_body is no longer necessary.
  • Add native detail parameter for input images (supported providers: Azure, OpenAI, xAI). extra_body is no longer necessary.
  • Add updated HTTP endpoint for querying inferences by ID (POST /v1/inferences/get_inferences).
  • Add updated HTTP endpoint for querying inferences with filters (POST /v1/inferences/list_inferences).

& multiple under-the-hood and UI improvements

2025.11.0 Breaking risk
⚠ Upgrade required
  • Update configuration: replace `enable_template_filesystem_access` with `template_filesystem_access.enabled`.
Breaking changes
  • Removed configuration field `enable_template_filesystem_access`; use `template_filesystem_access.enabled` instead.
Notable features
  • Automated experimentation (automated A/B testing)
  • Authentication for TensorZero Gateway with virtual API keys
  • Native inference parameters: reasoning_effort/thinking_budget_tokens and verbosity
Full changelog

[!WARNING]
Completed Deprecations

  • Completed the planned deprecation of the configuration field enable_template_filesystem_access in favor of template_filesystem_access.enabled.

Bug Fixes

  • Handle the global region correctly for GCP Vertex Anthropic.
  • Fix output format for JSON functions in the new endpoint for updating datapoints (PATCH /v1/{dataset_name}/datapoints). The output field now matches the inference endpoint (an object with a raw field; parsed is ignored and recomputed internally).

New Features

  • Add automated experimentation feature (automated A/B testing). Docs
  • Add authentication for the TensorZero Gateway (virtual API keys). Docs
  • Add native inference parameters to enable reasoning for every supported model provider (reasoning_effort or thinking_budget_tokens depending on the provider). extra_body is no longer necessary.
  • Add native verbosity inference parameter. extra_body is no longer necessary.
  • Support token inputs in the embeddings endpoint.
  • Support input thought content blocks for GCP Vertex Anthropic.
  • Improve handling of JSON Schemas for GCP Vertex Gemini and Google AI Studio.

& multiple under-the-hood and UI improvements

2025.10.9 Breaking risk
⚠ Upgrade required
  • Upgrade from the yanked `2025.10.8` release to this version.
  • Migrate any code using legacy `list_datapoints` or `experimental_list_inferences` to handle new content‑block format.
  • Update Helm chart values: remove `createLegacyIngress`; use only `tensorzero-gateway` ingress.
Breaking changes
  • Removed `list_datapoints` and `experimental_list_inferences` API signatures; updated data schema to use structured content blocks (`{"type": "text", ...}`, `{"type": "template", ...}`, `{"type": "file", ...}`).
  • Helm chart variable `createLegacyIngress` removed; legacy gateway ingress no longer supported.
Notable features
  • Added HTTP endpoints for datapoint CRUD operations (`GET`, `POST`, `PATCH`, `DELETE`).
  • UI support to create, update, and delete messages and content blocks in dataset editor.
  • Emit OpenTelemetry spans for rate‑limiting queries.
Full changelog

[!CAUTION]
Notice on 2025.10.8: We ran into a technical issue during the release process for 2025.10.8 that resulted in a broken build for the TensorZero Python SDK on PyPI. We've yanked that release and recommend upgrading to this version.

[!CAUTION]
Breaking Changes

  • This release includes small breaking changes to the programmatic observability/dataset APIs (e.g. list_datapoints, experimental_list_inferences) and the underlying data schema. Moving forward, TensorZero will store and return the new format for text ({"type": "text", "text": "..."}), template ({"type": "template", "name": "...", "arguments": { ... }}), and file ({"type": "file", "file_type": "...", ...}) content blocks. Note: These changes do not affect the inference APIs or the legacy data stored in ClickHouse.

[!WARNING]
Completed Deprecations

  • The TensorZero Helm chart will no longer support the legacy gateway ingress. The createLegacyIngress variable was removed. Moving forward, the only supported gateway ingress is tensorzero-gateway.

Bug Fixes

  • Fix an issue that prevented comments from being rendered in the workflow evaluation UI.

New Features

  • Add HTTP endpoint for querying datapoints by ID (POST /v1/datasets/get_datapoints).
  • Add HTTP endpoint for querying datapoints with filters (POST /v1/datasets/{dataset_name}/list_datapoints).
  • Add HTTP endpoint for creating datapoints from inferences (POST /v1/datasets/{dataset_id}/from_inferences).
  • Add HTTP endpoint for updating datapoints (PATCH /v1/{dataset_name}/datapoints).
  • Add HTTP endpoint for updating datapoint metadata (PATCH /v1/datasets/{dataset_name}/datapoints/metadata).
  • Add HTTP endpoint for deleting datapoints (DELETE /v1/datasets/{dataset_id}/datapoints).
  • Add HTTP endpoint for deleting datasets (DELETE /v1/datasets/{dataset_id}).
  • Enable users to create, update, and delete messages and content blocks in the dataset editor in the UI.
  • Emit OpenTelemetry spans for rate limiting queries.
  • Add support for deployment service accounts in the Helm chart (thanks @jinnovation!).
  • Add support for dynamic extra attributes for OTLP spans (TensorZero-OTLP-Traces-Extra-Attribute-*).

& multiple under-the-hood and UI improvements

2025.10.7 Breaking risk
⚠ Upgrade required
  • Deprecation warning: Untagged enums for file content blocks will be removed after 2026.2+; migrate to tagged enums with `file_type` (`url`, `base64`, or `object_storage`).
  • Deprecation warning: Type `InferenceFilterTreeNode` in TensorZero Python SDK will be renamed to `InferenceFilter`; both aliases available until 2026.2+.
Breaking changes
  • Default value for `fetch_and_encode_input_files_before_inference` changed from true to false, altering when input files are fetched relative to inference.
Notable features
  • Batch datapoint updates via PATCH /v1/{dataset_name}/datapoints
  • Thought summaries exposed in TensorZero Python SDK
  • Additional semantic tags added for OpenInference trace exports
Full changelog

[!CAUTION]
Breaking Changes

  • The default value for fetch_and_encode_input_files_before_inference is changing from true to false. As a result, the gateway will no longer fetch input files before inference, but instead will fetch them in parallel with inference (for observability). In rare cases, this may cause the gateway to receive different input files than those received by model providers.

[!WARNING]
Planned Deprecations

  • Migrate file content blocks from untagged enums to tagged enums. Moving forward, you should provide a field file_type with a value of "url", "base64", or "object_storage". Untagged enums are still accepted for backwards compatibility but will be deprecated in 2026.2+.
  • Rename the TensorZero Python SDK type InferenceFilterTreeNode to InferenceFilter for consistency with related types. Both types will be available as aliases until 2026.2+.

Bug Fixes

  • Send a user agent when fetching input files to avoid restrictions from websites that require it (e.g. Wikimedia).

New Features

  • Add a new endpoint for batch datapoint updates (PATCH /v1/{dataset_name}/datapoints).
  • Expose thought summaries in the TensorZero Python SDK.
  • Add additional semantic tags when exporting traces using the OpenInference format (thanks @jinnovation!)

& multiple under-the-hood and UI improvements

2025.10.6 Breaking risk
⚠ Upgrade required
  • Update configuration: change `type = "static"` to `type = "inference"` and `type = "dynamic"` to `type = "workflow"`. Both old values remain accepted until tensorzero 2026.2+.
Breaking changes
  • Renaming configuration field `type = "static"` to `type = "inference"` and `type = "dynamic"` to `type = "workflow"`. Both old names will be supported until version 2026.2+.
Notable features
  • Short-hand model names for OpenAI Responses API (e.g., openai::responses::gpt-5)
  • Dynamic provider tools supporting web search via OpenAI Responses API
  • Custom `api_base` support for Anthropic model provider
Full changelog

[!WARNING]
Planned Deprecations

  • We're renaming "static evaluations" to "inference evaluations" and "dynamic evaluations" to "workflow evaluations". The only action needed is to update type = "static" in the configuration to type = "inference". Both versions will be supported until 2026.2+.

Bug Fixes

  • Fix a bug that dropped tool IDs in output tool_call content blocks when updating datapoints.
  • Prefer magic bytes over the Content-Type HTTP response header to infer MIME types of input files.

New Features

  • Support short-hand model names for the OpenAI Responses API (e.g. openai::responses::gpt-5).
  • Support dynamic provider tools (e.g. web search with the OpenAI Responses API).
  • Support custom api_base for the Anthropic model provider.

& multiple under-the-hood and UI improvements

2025.10.5 Feature
Notable features
  • FinishReason.STOP_SEQUENCE enum value added to TensorZero Python SDK
Changelog

Bug Fixes

  • Add FinishReason.STOP_SEQUENCE to the TensorZero Python SDK.
2025.10.4 Bug fix
⚠ Upgrade required
  • Deprecation: `bulk_insert_datapoints` endpoint will be renamed to `create_datapoints`; both available until 2026.2+.
  • Python SDK type renames: `*InferenceDataset` → `*InferenceDatapoint`, `*Node` → `*Filter`.
  • Legacy inference input formats are no longer accepted (were deprecated previously).
Notable features
  • Support OpenAI Responses API
  • Structured generation (strict JSON) on Groq model provider
  • File URLs as inputs for Anthropic model provider
Full changelog

[!WARNING]
Planned Deprecations

  • The bulk_insert_datapoints method (POST /datasets/{dataset_name}/datapoints/bulk) will be renamed to create_datapoints (POST /datasets/{dataset_name}/datapoints). Both methods will be available until 2026.2+. (thanks @BrianLi23!)

[!WARNING]
Completed Deprecations

  • Concluded many small ongoing deprecations:

    • Python SDK: renamed the types *InferenceDataset*InferenceDatapoint and *Node*Filter
    • Inference: stop accepting legacy input formats (e.g. inline arguments for templates). These legacy formats have issued deprecation warnings for the last several months.
    • Dynamic Evaluations: renamed the variable datapoint_idtask_id

Bug Fixes

  • Improve the rendering performance of the code editor in the UI.
  • Fixed the X_per_month rate limit to cover a calendar month rather than 30 days.
  • Use max_completion_tokens rather than max_tokens in the Azure OpenAI Service model provider.

New Features

  • Support the OpenAI Responses API.
  • Support structured generation (strict JSON mode) on the Groq model provider.
  • Support inputs with file URLs on the Anthropic model provider.
  • Support encrypted reasoning and thought summaries on the OpenAI model provider.
  • Support dynamic OTLP resources when exporting OpenTelemetry traces (tensorzero-otlp-traces-extra-resource-*).
  • Support fallbacks for dynamic credentials (e.g. api_key_location = { default = "dynamic::foo", fallback = "env::bar" }).
  • Improve the handling for stale datapoints in the UI.

& multiple under-the-hood and UI improvements

2025.10.3 Bugfix

Fixed Playground UI failures for inferences using static tools with custom names.

Full changelog

Bug Fixes

  • Fix bug in the Playground UI that caused inferences containing static tools with custom names (tools.my_tool.name) to fail.
2025.10.2 Breaking risk
⚠ Upgrade required
  • Explicitly list dynamic tools in the allowed‑tools configuration before the upcoming release.
  • Update any scripts or configs using `datapoint_name` to use `task_name`.
  • If relying on the default `--config-file` flag, add it manually when building/pulling `tensorzero/gateway` images.
Breaking changes
  • Dynamic tools will no longer be automatically included in the allowed list; explicit allowance required.
  • Renamed configuration key `datapoint_name` to `task_name` for dynamic evaluations.
  • Removed default inclusion of `--config-file` flag from `tensorzero/gateway` Dockerfile.
Notable features
  • Custom granular rate limits for users
  • Dynamic and static OTLP header support (Python SDK and config)
  • Optional `max_distance` field for `experimental_dynamic_in_context_learning` variants
Full changelog

[!WARNING]
Planned Deprecations

  • Currently, the gateway automatically includes all dynamic tools in the list of allowed tools. In a near-future release, dynamic tools will no longer be included automatically. If you intend for your dynamic tools to be allowed, please allow them explicitly.

[!WARNING]
Completed Deprecations

  • Finish renaming datapoint_nametask_name for dynamic evaluations.
  • Stop including --config-file in the Dockerfile for tensorzero/gateway by default.
  • Use the TENSORZERO_CLICKHOUSE_URL environment variable instead of CLICKHOUSE_URL.
  • Remove deprecated features from the OpenAI-compatible inference API.

Bug Fixes

  • Handle json_mode correctly in experimental_best_of_n variants.

New Features

  • Allow users to define and enforce custom granular rate limits.
  • Update the UI to handle unlimited named templates and schemas.
  • Support dynamic OTLP headers in the Python SDK.
  • Support static OTLP headers in the configuration.
  • Add optional max_distance configuration field for experimental_dynamic_in_context_learning variants.
  • Improve fallback behavior for experimental_dynamic_in_context_learning variants.
  • Allow Google AI Studio Gemini to accept input files beyond images.
  • Add name to datapoints.
  • Add experimental_run_evaluation to the Python SDK.
  • Allow users to configure default credentials by provider type.
  • Support supervised fine-tuning for Together AI models in the UI.

& multiple under-the-hood and UI improvements (thanks @dangvu0502!)

2025.10.1 New feature
Notable features
  • Increased default body limit to 100 MB for patch_openai_client
Full changelog

New Features

  • Increase default body limit to 100MB for patch_openai_client.

& multiple under-the-hood and UI improvements

2025.10.0 Breaking risk
⚠ Upgrade required
  • Deprecation warning: replace `timeouts.non_streaming.total_ms` with `timeout_ms` for embedding model timeouts; removal planned in 2026.1+.
  • Deprecation warning: use CLI flags `--run-clickhouse-migrations` and `--run-postgres-migrations` instead of `--run-migrations-only`; removal planned in 2026.1+.
  • Deprecation warning: Prometheus metrics `request_count` and `inference_count` will be removed; use `tensorzero_requests_total` and `tensorzero_inferences_total`.
Notable features
  • UI support for adding, editing, and deleting tags for datapoints
  • UI support for adding, editing, and deleting `system` entries for datapoints
  • Configuration flag `gateway.fetch_and_encode_input_files_before_inference` with default true
Full changelog

[!WARNING]
Planned Deprecations

  • Configure timeouts for embedding models and embedding model providers with timeout_ms instead of timeouts.non_streaming.total_ms. The latter will be removed in a future release (2026.1+).
  • Use the gateway CLI flags --run-clickhouse-migrations and --run-postgres-migrations instead of --run-migrations-only. --run-migrations-only requires credentials for both databases, even though Postgres is an optional dependency, so it will be removed in a future release (2026.1+).
  • Scrape the Prometheus metrics tensorzero_requests_total and tensorzero_inferences_total instead of request_count and inference_count. The gateway will double-emit the metrics for now; the deprecated metrics will be removed in a future release (2026.1+).

Bug Fixes

  • Fixed an issue that prevented static evaluations on datapoints with no reference output to be rendered in the UI.
  • Fixed a regression in the gateway's internal HTTP client that that triggered unnecessary warnings and deteriorated performance when handling many concurrent streaming inferences.
  • Fixed an issue that prevented base64-encoded embedding requests from being cached by TensorZero.

New Features

  • Allow users to add, edit, and delete tags for datapoints in the UI.
  • Allow users to add, edit, and delete system for datapoints in the UI.
  • Add the configuration setting gateway.fetch_and_encode_input_files_before_inference. If set to true (default), the gateway will fetch remote input files and send them as a base64-encoded payload in the prompt; this is recommended to ensure that TensorZero and the model providers see identical inputs. If set to false, TensorZero will forward the input file URLs and fetch them for observability in parallel with inference.
  • Improved gateway errors for database issues.

& multiple under-the-hood and UI improvements

2025.9.6 Bug fix
Notable features
  • Multiple small improvements to the evaluations UI for streamlined workflows and simplified debugging
Full changelog

Bug Fixes

  • Implemented a workaround for an upstream bug in opentelemetry-otlp that caused our OTLP exporter to fail to send data to encrypted endpoints.

New Features

  • Added multiple small improvements to the evaluations UI to streamline common workflows and simplify debugging.

& multiple under-the-hood and UI improvements

2025.9.5 New feature
Notable features
  • Model observability page showing throughput and latency analytics in the UI
  • Support for OpenInference format when exporting OpenTelemetry traces
  • Supervised fine‑tuning (SFT) with GCP Vertex AI Gemini added to the UI
Full changelog

New Features

  • Add model observability page to the UI with model throughput and latency analytics.
  • Add support for OpenInference format when exporting OpenTelemetry traces.
  • Expand support of UI features for the default function (e.g. "Try with model").
  • Add support for supervised fine-tuning (SFT) with GCP Vertex AI Gemini in the UI.
  • Improve the performance of episode table in the UI.
  • Add an example of using the programmatic workflow for dynamic in-context learning.

& multiple under-the-hood and UI improvements (thanks @AnnaVernerovaHID @dangvu0502 @jinnovation!)

2025.9.4 Breaking risk
⚠ Upgrade required
  • Planned deprecation: rename Python SDK types from `Dicl*` to `DICL*`; both versions work now but deprecated ones will be removed in 2025.12+.
Notable features
  • Support unlimited prompt templates per function
  • Add `append_to_existing_variants` to programmatic DICL interface
  • Skip writing inference cache entries on tool call validation failure
Full changelog

[!WARNING]
Planned Deprecations

  • Rename types from Dicl* to DICL* in the Python SDK for consistency. Both versions work for now, and the deprecated types will be removed in a future release (2025.12+).

Bug Fixes

  • Fix a regression in the UI that prevented chat datapoints from being edited.

New Features

  • Expand the prompt templates and schemas functionality to support unlimited templates per function.
  • Support appending to existing DICL variants in the programmatic interface (append_to_existing_variants).
  • Skip writing inference cache entries if tool call validation fails.

& multiple under-the-hood and UI improvements (thanks @BretHudson!)

2025.9.3 New feature
Notable features
  • Dynamic OTLP header support for OpenTelemetry trace export
  • `allowed_tools` field added to OpenAI-compatible inference endpoint
  • Automatic HTTP/2 connection adjustment based on concurrency
Full changelog

New Features

  • Add support for dynamic OTLP headers when exporting OpenTelemetry traces.
  • Add support for allowed_tools field in the OpenAI-compatible inference endpoint.
  • Improve performance by automatically adjusting the number of HTTP2 connections to model providers based on concurrency.

& multiple under-the-hood and UI improvements (thanks @yuria-loo!)

2025.9.1 Bug fix
Notable features
  • Programmatic API for reinforcement fine-tuning (RFT) with OpenAI
  • Defaults added for individual fields in the `retries` configuration
  • Dynamic specification of Azure provider endpoint
Full changelog

Bug Fixes

  • Fix a regression that prevented rendering of inferences with thought content blocks in the UI.
  • Stop logging HTTP requests and responses twice in debug mode.

New Features

  • Add a programmatic API for reinforcement fine-tuning (RFT) with OpenAI.
  • Provide defaults for individual fields in the retries configuration.
  • Allow users to specify the Azure provider endpoint dynamically. (thanks @Dineshm-coder!)
  • Improve error messages when the gateway is missing credentials.

& multiple under-the-hood and UI improvements (thanks @JoshuaTanaka @HJStaiff!)

2025.9.0 Breaking risk
Breaking changes
  • The `feedback_id` field in the TensorZero Python SDK is no longer incorrectly doubly nested, aligning with type annotations.
Notable features
  • Throughput chart added to function detail page in TensorZero UI
  • Export OpenTelemetry spans for feedback endpoint
  • Recipes for supervised fine-tuning with `torchtune` and `axolotl`
Full changelog

[!CAUTION]

Breaking Changes

  • The bug fix for feedback_id technically introduces a breaking change in the TensorZero Python SDK. The field is no longer incorrectly doubly nested and now matches the SDK's type annotations.

[!WARNING]
Completed Deprecations

  • json_mode is now required for JSON function variants.

Bug Fixes

  • Added workarounds for two ClickHouse regressions (ClickHouse/ClickHouse#86415, ClickHouse/ClickHouse#86557) introduced in ClickHouse 25.8. Replicated self-hosted clusters are still affected by ClickHouse/ClickHouse#86434. Pin to 25.7 or earlier if you run a replicated cluster. Single-node self-hosted deployments and ClickHouse Cloud are not affected.
  • Fixed a bug in the TensorZero Python SDK that caused feedback_id to be doubly nested in feedback responses.
  • Fixed a logging issue where models were incorrectly reported as "not found" in the embedding endpoint even on success.
  • Fixed a bug where pending insertions could be dropped during shutdown when gateway.observability.batch_writes.enabled = true.
  • Fixed a bug in the dynamic in-context learning (DICL) recipe and programmatic API. The gateway automatically detects problematic examples and logs a warning with resolution instructions if necessary.

New Features

  • Added a throughput chart to the function detail page in the TensorZero UI.
  • Support exporting OpenTelemetry spans for the feedback endpoint.
  • Added recipes for supervised fine-tuning with torchtune and axolotl.
  • Added examples for using the embedding endpoint with Azure OpenAI Service and OpenAI-compatible providers like Ollama (thanks @slbotbm!).
  • Updated the DICL recipe to use TensorZero's new embedding API.
  • Added support for caching embeddings (thanks @ishbir!).

& multiple under-the-hood and UI improvements (thanks @contrun @jinnovation!)

2025.8.5 Bug fix
Notable features
  • Programmatic optimization interface for dynamic in-context learning
  • Exposure of more hyperparameters for programmatic supervised fine-tuning with Together AI
Full changelog

Bug Fixes

  • Reduce the ClickHouse memory footprint in large deployments with human feedback for evaluations.

New Features

  • Add a programmatic optimization interface for dynamic in-context learning.
  • Expose more hyperparameters for programmatic supervised fine-tuning with Together AI.

& many under-the-hood and UI improvements (thanks @quangIO!)

2025.8.4 Breaking risk
Breaking changes
  • Removal of support for unprefixed model names in the OpenAI‑compatible embeddings endpoint; future releases (2025.12+) will require prefix `tensorzero::embedding_model_name::`.
Notable features
  • Added `extra_body` field to embedding model configurations for custom API request fields.
  • Updated Azure OpenAI Service provider to use API version `2025-04-01-preview`.
  • Added CrewAI integration example.
Full changelog

[!WARNING]
Planned Deprecations

  • The OpenAI-compatible embeddings endpoint will require the prefix tensorzero::embedding_model_name:: for model names (e.g. tensorzero::embedding_model_name::openai::text-embedding-3-small). Support for unprefixed names will be removed in a future release (2025.12+).

Bug Fixes

  • Fix a ClickHouse warning that occurred when a model inference had input tokens set to null and output tokens non-null, or vice versa. This issue only caused warnings and did not affect TensorZero's user-facing functionality.

New Features

  • Add extra_body support for embedding model configurations to enable custom API request fields for various embedding providers. (thanks @ishbir!)
  • Update the Azure OpenAI Service model provider to use API version 2025-04-01-preview.
  • Add CrewAI integration example.

& multiple under-the-hood and UI improvements (thanks @MengAiDev!)

2025.8.3 Breaking risk
⚠ Upgrade required
  • If you previously enabled batching writes to ClickHouse via the embedded Python gateway, disable that setting or switch to a standalone (HTTP) gateway to avoid deadlocks caused by GIL interactions.
Breaking changes
  • Removed support for batching writes to ClickHouse when using the embedded Python gateway; batching remains available with a standalone (HTTP) gateway.
Notable features
  • Configuration can be split into multiple files using glob patterns
  • Example added for multimodal (vision) fine-tuning
  • More hyperparameters exposed for programmatic supervised fine‑tuning with Fireworks
Full changelog

[!CAUTION]
Breaking Changes

  • Temporarily removing support for batching writes to ClickHouse with the embedded gateway in Python: In the previous release, we added support for batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale (default off). Later, we discovered that in rare scenarios, the Python GIL could interfere with this setting in embedded clients and cause a deadlock. While we investigate a solution, we are removing support for batching with the embedded client to prevent technical footguns. Batching remains available when using a standalone (HTTP) gateway.

New Features

  • Add support for splitting configuration into multiple files with glob patterns
  • Add an example for multimodal (vision) fine-tuning
  • Expose more hyperparameters for programmatic supervised fine-tuning with Fireworks
  • Optimize queries in the UI to improve the performance of assorted pages in large-scale deployments
  • Enable setting global labels for all created resources in Helm (thanks @jinnovation!)
  • Support embedding endpoint when using the OpenAI SDK with an embedded gateway (patch_openai_client)

& many under-the-hood and UI improvements (thanks @wliu4040!)

2025.8.2 New feature
Notable features
  • Playground UI for side‑by‑side variant comparison, prompt iteration, and inference replay
  • ClickHouse write batching to increase ingest throughput and lower insert overhead at scale
  • Jupyter notebook recipe for supervised fine‑tuning with Unsloth
Full changelog

New Features

  • Add a Playground to the UI to compare variants side-by-side, iterate on prompts quickly, and replay inference requests.
  • Support batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale.
  • Add a Jupyter notebook recipe for supervised fine-tuning with Unsloth.

& many under-the-hood and UI improvements (thanks @contrun @lblack00!)

2025.8.1 New feature
Notable features
  • OpenAI‑compatible endpoint for embeddings supporting OpenAI and Azure OpenAI Service providers
  • Self‑hosted replicated ClickHouse database support
  • Parse `reasoning_content` from Fireworks and vLLM model providers
Full changelog

New Features

  • Add an OpenAI-compatible endpoint for embeddings, with support for OpenAI (& OpenAI-compatible) and Azure OpenAI Service model providers.
  • Add support for self-hosted replicated ClickHouse databases.
  • Parse reasoning_content from Fireworks and vLLM model providers.
  • Improve error messages for AWS Bedrock and AWS SageMaker model providers.

Bug Fixes

  • Allow configuration to specify description for JSON functions.
  • Fix a regression where function descriptions were no longer rendered in the UI.

& many under-the-hood and UI improvements (thanks @yuvraj-kumar-dev)

2025.8.0 New feature
Notable features
  • gateway.observability.skip_completed_migrations config to skip ClickHouse migration workflow on startup
  • Support for raw_text content blocks in OpenAI-compatible inference endpoint
  • Ability to collect outputs from "Try with variant" UI as demonstrations
Full changelog

New Features

  • Add gateway.observability.skip_completed_migrations configuration option to reduce gateway startup time and database load. When enabled, the gateway will skip running the ClickHouse migration workflow (i.e. verifying and potentially applying every migration) on startup for migrations that are already present in a database table that tracks migration history.
  • Support raw_text content blocks in the OpenAI-compatible inference endpoint. (Thanks @hongantran3804 @pykm05 @pycoder49!)
  • Allow users to collect outputs from "Try with variant" in the UI as demonstrations.

Bug Fixes

  • Fix handling of reasoning content blocks for DeepSeek-R1 on AWS Bedrock.
  • Set proper default value for max_tokens for the Anthropic and GCP Vertex AI Anthropic model providers. The gateway will now error if no value is provided in the configuration or request and the model is unknown.
  • Skip caching model inferences that generated invalid tool call arguments.

& many under-the-hood and UI improvements (thanks @michaldorsett @K-coder05 @dcaputo-harmoni @masonblier @Nicolasgarbarino!)

2025.7.5 New feature
Notable features
  • Added `gateway.unstable_disable_feedback_target_validation` flag for large-scale deployments
Full changelog

Experimental

  • Add gateway.unstable_disable_feedback_target_validation configuration option to improve the performance of the feedback endpoint in large-scale deployments (not recommended unless you know what you're doing).

& multiple under-the-hood and UI improvements (thanks @michaldorsett @HJStaiff @liamjdavis!)

2025.7.4 Bug fix
Notable features
  • Soft deletion of datasets via UI
  • Filtering by time and tags in experimental_list_inferences
  • Ordering by metric value and time in experimental_list_inferences
Full changelog

Bug Fixes

  • Fixed an issue with inference caching where inference requests that were identical except for their inline (base64-encoded) file data incorrectly shared the same cache key, resulting in false cache hits. The cache key now includes a hash of the inline file data, ensuring that such requests are properly distinguished.

New Features

  • Added functionality for deleting datasets in the UI (soft deletion).

Experimental

  • Added support for filtering by time and tags to the experimental_list_inferences method.
  • Added support for ordering by metric value and time to the experimental_list_inferences method.

& multiple under-the-hood and UI improvements (thanks @NamNgHH!)

2025.7.3 Bug fix
⚠ Upgrade required
  • Migrate `gateway.enable_template_filesystem_access = true` to `gateway.template_filesystem_access.enabled = true`
Full changelog

[!WARNING]
Planned Deprecations

  • Migrate gateway.enable_template_filesystem_access = true to gateway.template_filesystem_access.enabled = true. We're about to add more fields to enable_template_filesystem_access to support multi-file configuration.

Bug Fixes

  • Remove a third-party dependency that was causing a memory leak in the UI.
  • Fix a regression that prevented the UI from running offline.

& multiple under-the-hood and UI improvements

2025.7.2 Bugfix

Fixed occasional connection errors with ClickHouse Cloud by updating the client implementation.

Full changelog

Bug Fixes

  • Update TensorZero's ClickHouse client to match the parameter recommendations by ClickHouse. (This change aims to resolve occasional connection errors with ClickHouse Cloud.)
2025.7.1 New feature
⚠ Upgrade required
  • Experimental flag `gateway.unstable_error_json` now returns internal error details in response body.
Notable features
  • Improve UI components for rendering text, JSON, Markdown, and MiniJinja templates (syntax highlighting, line numbers, wrapping, etc.)
  • Improve performance of the UI's episode list page
  • Launch SFT jobs for Together AI and GCP Vertex AI Gemini programmatically
Full changelog

New Features

  • Improve UI components for rendering text, JSON, Markdown, and MiniJinja templates (syntax highlighting, line numbers, wrapping, etc.)
  • Improve the performance of the UI's episode list page
  • Add pseudonymous usage analytics to the gateway (see docs for details and instructions to opt out)

Experimental

  • Launch SFT jobs for Together AI and GCP Vertex AI Gemini programatically
  • Return internal error details in the response body (gateway.unstable_error_json) (thanks @panesher)

& many under-the-hood and UI improvements (thanks @michaldorsett @itsrajatrai @caarlos0)

2025.7.0 Breaking risk
Notable features
  • Supervised fine‑tuning workflow now fully supports multimodal data (vision, documents) with multi‑turn tool use and TensorZero inference capabilities
  • Streaming inference support added for best‑of‑n and mixture‑of‑n variant types
  • Experimental Python client methods: `experimental_launch_optimization`, `experimental_poll_optimization`, `experimental_get_config` and extended `experimental_render_inferences`
Full changelog

New Features

  • Revamped the UI's supervised fine-tuning workflow to fully support TensorZero's inference capabilities, including multimodal data (vision, documents, etc.), multi-turn tool use, and more.
  • Added streaming inference support for best-of-n and mixture-of-n variant types.
  • Optimized the performance of some database queries in the UI.

Experimental

Experimental features don't have a stable API. They may change or be removed in future releases.

  • Added methods to the Python client for programmatically launching (experimental_launch_optimization) and polling for (experimental_poll_optimization) optimization jobs. For now, these methods support supervised fine-tuning with OpenAI and Fireworks AI.
  • Added a method to the Python client for retrieving the configuration (experimental_get_config).
  • Updated experimental_render_inferences to accept outputs from both experimental_list_inferences and list_datapoints.

& many under-the-hood and UI improvements (thanks @jeevikasirwani!)

2025.6.3 New feature
Notable features
  • Added `delete = true` option to `extra_body` and `extra_headers` to remove built-in fields
  • Introduced `gateway.base_path` configuration field to prefix all endpoints
  • Added `discard_unknown_chunks` in model provider config to ignore unsupported chunk types
Full changelog

New Features

  • Add delete = true option to extra_body and extra_headers configuration fields to instruct the gateway to delete built-in fields from the request body or headers.
  • Add gateway.base_path field to configuration to instruct the gateway to prefix all endpoints with this path.
  • Add discard_unknown_chunks field to model provider configuration to instruct the gateway to discard chunks with unknown or unsupported types instead of throwing an error.
  • Add optional name field to tool configuration; if provided, the tool name will be sent to the LLMs instead of the tool ID, allowing for multiple tools with the same name.
  • Add functionality to filter list_datapoints by function name.

& multiple under-the-hood and UI improvements

2025.6.2 New feature
Notable features
  • Granular timeouts via `[timeouts]` in variant and model configuration blocks
  • Shorthand model names for Groq (`groq::...`) and OpenRouter (`openrouter::...`) providers
  • Explicit `stop_sequences` inference parameter
Full changelog

New Features

  • Add recipe for supervised fine-tuning with Google Vertex AI Gemini
  • Add granular timeouts ([timeouts]) to variant and model configuration blocks
  • Support short-hand model names for Groq (groq::...) and OpenRouter (openrouter::...) model providers
  • Support tool use with vLLM (thanks @CHRV @chaet1t!)
  • Add explicit stop_sequences inference parameter
  • Support dynamic credentials in OpenAI-compatible inference endpoint (tensorzero::credentials) (thanks @zmij!)
  • Support multimodal inference and file inputs on AWS Bedrock

& multiple under-the-hood and UI improvements

2025.6.1 Breaking risk
⚠ Upgrade required
  • Return null instead of an empty string when `service_tier` is missing in the OpenAI‑compatible inference endpoint.
Breaking changes
  • During streaming inference, `raw_name` in a tool call chunk is now an empty string after the tool name has finished streaming, differing from previous behavior where it repeated the same value.
Notable features
  • Allow inference containing files with arbitrary MIME types
  • [timeouts] section added to model provider configuration for granular timeout settings
  • Support templates without schemas; built‑in variables `system_text`, `assistant_text`, and `user_text` are now available
Full changelog

[!CAUTION]
Breaking Changes

  • Streaming Inference + Tool Use: During streaming inferences, raw_name in a tool call chunk represents a delta that should be accumulated. If the tool name has finished streaming, this field will contain an empty string. Previously, TensorZero returned the same raw_name in every subsequent chunk for that tool call. The new behavior matches the OpenAI API's behavior.

Bug Fixes

  • Return null instead of an empty string when missing service_tier in the OpenAI-compatible inference endpoint

New Features

  • Allow inference containing files with arbitrary MIME types
  • Add [timeouts] to model provider configuration for granular timeout functionality
  • Support templates without schemas; add built-in system_text, assistant_text, and user_text template variables
  • Support tags in OpenAI-compatible inference endpoint (tensorzero::tags)
  • Add experimental_list_inferences method to the client for retrieving historical inferences

& multiple under-the-hood and UI improvements (thanks @vr-varad!)

2025.6.0 Bug fix
Notable features
  • Handle thinking and unknown content blocks for GCP Vertex Anthropic and Gemini models
  • Added `endpoint_id` field in configuration for fine‑tuned GCP Vertex Anthropic and Gemini models
  • Introduced Groq (`groq`) model provider
Full changelog

Bug Fixes

  • Increase database health check timeout in the gateway to 180s to gracefully handle warmup of serverless databases

New Features

  • Handle thinking and unknown content blocks for gcp_vertex_anthropic and gcp_vertex_gemini models
  • Add endpoint_id field in the configuration for gcp_vertex_anthropic and gcp_vertex_gemini models to support fine-tuned models
  • Add a dedicated Groq (groq) model provider (thanks @oliverbarnes!)
  • Support include_original_response during streaming inference

& multiple under-the-hood and UI improvements

Beta — feedback welcome: [email protected]