Release history
tensorzero releases
All releases
64 shown
- UI requires authentication when the gateway requires authentication (previously only for gateway usage).
- Improved error handling and logging for complex streaming inferences, including status code propagation and fallbacks.
Full changelog
[!CAUTION]
Breaking Changes
- The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.
New Features
- Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).
& multiple under-the-hood and UI improvements (thanks @arisp)
- Deprecation: TensorZero Autopilot "Sessions" page removed from UI; future platform‑agnostic workflows planned.
- Gateway defaults to async observability writes; previous synchronous behavior requires `observability.async_writes = false`.
- TypeScript evaluators for inference evaluations
- Support for vLLM's new `reasoning` field
- Aggregated variant usage data (tokens, cost) in UI
Full changelog
[!CAUTION]
Breaking Changes
- The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set
observability.async_writes = false. [docs]
[!WARNING]
Deprecations
- Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.
Bug Fixes
- Return HTTP code 429 for rate limiting errors.
- Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)
New Features
- Added TypeScript evaluators (for inference evaluations).
- Added support for vLLM's new
reasoningfield. - Added aggregated variant usage data (tokens, cost, etc.) to the UI.
- Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
- Added
export.otlp.traces.include_content(default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.
& multiple under-the-hood and UI improvements
- Add MCP server to gateway exposing API at /mcp
- Report provider prompt caching statistics via API and UI
- Report usage statistics (tokens, latency, cost) for inference evaluations via CLI, API, and UI
Full changelog
New Features
- Add an MCP server to the gateway exposing its API in
/mcp. - Report provider prompt caching statistics via API and UI.
- Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI.
- Add the Prometheus metrics
tensorzero_input_tokens_totalandtensorzero_output_tokens_total. - Add configuration field
content_type_overridesto handle file inputs for long-tail providers.
& multiple under-the-hood and UI improvements
- Deprecation: Inference evaluation config must be nested under function names; legacy flat format will be removed in a future release.
- Deprecation: `launch_optimization` with `GEPAConfig` is deprecated and will be removed; use `t0.optimization.gepa.launch` instead.
- TensorZero Autopilot: automated AI engineer that analyzes LLM data, configures evaluations, optimizes prompts/models, and runs A/B tests
- Embeddings requests now counted in Prometheus metrics `tensorzero_requests_total` and `tensorzero_inferences_total`
- Observability configuration field `observability.batch_writes.write_queue_capacity` added for gateway backpressure
Full changelog
[!WARNING]
Planned Deprecations
- The configuration for inference evaluations should be nested under the relevant functions moving forward [docs]. You can run evaluations by providing a function name and a list of evaluators. The legacy format will be removed in a future release.
[functions.write_haiku.evaluators.exact_match] type = "exact_match"- The legacy implementation of GEPA (
launch_optimizationwithGEPAConfig) will be removed in a future release. Please uset0.optimization.gepa.launchinstead. [docs]
Bug Fixes
- Fixed a UI bug where a custom gateway
base_pathwas not handled correctly in certain routes. (thanks @wangfenjin!)
New Features
- Started including embeddings requests in the Prometheus metrics
tensorzero_requests_totalandtensorzero_inferences_total. - Added the configuration field
observability.batch_writes.write_queue_capacityto enable backpressure for observability data in the gateway.
& multiple under-the-hood and UI improvements (thanks @majiayu000)!
[!IMPORTANT]
🆕 TensorZero Autopilot
TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests.
It dramatically improves the performance of LLM agents across diverse tasks:
- Removed assistant message prefill for JSON functions with Anthropic (deprecated by Anthropic).
- GEPA automated prompt engineering via durable workflows
- Support duplicate tool calls in `all_of` evaluators for parallel execution
- UI option to set expiration date for API keys
Full changelog
Bug Fixes
- Fixed two edge cases affecting batch inference.
- Fixed a UI bug affecting "Try with..." with inputs that include base64 files.
- Removed assistant message prefill for JSON functions + Anthropic (deprecated by Anthropic).
New Features
- Added an implementation of GEPA (automated prompt engineering) based on durable workflows.
- Allow users to specify duplicate tool calls in
all_oftool evaluators to evaluate parallel tool calling. - Allow users to specify an expiration date for API keys in the UI. (thanks @eibrahim95)
- Allow users to specify
object_storage.endpoint = "env::MY_ENV_VAR"in addition to static values. (thanks @Meredith2328)
& multiple under-the-hood and UI improvements (thanks @majiayu000)!
- Postgres added as an alternative observability backend to ClickHouse (recommended for low RPS)
- `openrouter::xxx` shorthand for embedding models
- Per-session API keys in the browser when auth is enabled
Full changelog
Bug Fixes
- Fixed an UI issue that prevented certain pages from rendering when depending on historical configuration.
New Features
- Added Postgres as an alternative observability backend to ClickHouse. Postgres is the simplest way to get started; we recommend ClickHouse if you're handling >100 RPS.
- Added the
openrouter::xxxshort-hand for embedding models. - Added support for per-session API keys in the browser (instead of a global environment variable) when auth is enabled.
& multiple under-the-hood and UI improvements!
- The embedded gateway in the TensorZero Python SDK will be removed in version 2026.6+; migrate to a standalone TensorZero Gateway using `base_url` for OpenAI SDK or `build_http` for TensorZero SDK.
- The variant configuration field `weight` will be removed in version 2026.6+; transition to the new experimentation configuration semantics documented at https://www.tensorzero.com/docs/experimentation/run-static-ab-tests.
- Removed `model_provider_name` filter for `extra_body` and `extra_headers`; use `model_name` and `provider_name` instead.
- Removed legacy experimental `list_inferences` endpoint; use the new endpoint documented at https://www.tensorzero.com/docs/observability/query-historical-inferences.
- Removed several long-deprecated types and methods from the TensorZero Python SDK.
- Added support for launching optimization workflows with `dataset_name` in `launch_optimization_workflow`.
Full changelog
[!WARNING]
Completed Deprecations
- Removed the deprecated
model_provider_namefilter forextra_bodyandextra_headers. Please usemodel_nameandprovider_nameinstead.- Removed the legacy experimental
list_inferencesendpoint and method. Please use the new endpoint instead. [docs]- Removed several long-deprecated types and methods from the TensorZero Python SDK.
[!WARNING]
Planned Deprecations
- The embedded gateway in the TensorZero Python SDK will be removed in a future release (2026.6+).
patch_openai_clientandbuild_embeddedare deprecated. Please deploy a standalone TensorZero Gateway instead (usage:base_urlfor OpenAI SDK;build_httpfor TensorZero SDK).- The variant configuration field
weightwill be removed in a future release (2026.6+). Please use the new experimentation configuration semantics. [docs]
Bug Fixes
- Fixed a compatibility bug with Valkey-based caching that only affected Redis.
New Features
- Added support for launching optimization workflows with
dataset_name(instead of an inference query) inlaunch_optimization_workflow.
& multiple under-the-hood and UI improvements!
- Configuration fields static_weights, track_and_stop will be removed in a future release; see Run adaptive A/B tests and Run static A/B tests docs for updated usage.
- Evaluator configuration field cutoff will be removed; use CLI flag --cutoffs evaluator=value,... instead.
- Gateway route /variant_sampling_probabilities will be removed in a future release.
- Removed deprecated Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram; use tensorzero_inference_latency_overhead_seconds instead.
- Added regex and tool_use evaluators.
- Added experimental_launch_optimization_workflow to the TensorZero Python SDK.
Full changelog
[!WARNING]
Completed Deprecations
- The deprecated Prometheus metric
tensorzero_inference_latency_overhead_seconds_histogramwas removed. Usetensorzero_inference_latency_overhead_secondsinstead.
[!WARNING]
Planned Deprecations
- The configuration for experimentation (e.g.
static_weights,track_and_stop) was simplified. The old notation will be removed in a future release. See Run adaptive A/B tests and Run static A/B tests for more information.- The evaluator configuration field
cutoffwill be removed in a future release. Instead, provide--cutoffs evaluator=value,...in the CLI.- The gateway route
/variant_sampling_probabilitieswill be removed in a future release.- The configuration field
postgres.enabledwill be removed in a future release. Instead, the gateway will consider whether the environment variableTENSORZERO_POSTGRES_URLis set.
New Features
- Add
regexandtool_useevaluators. [docs] - Add
experimental_launch_optimization_workflowto the TensorZero Python SDK.
& multiple under-the-hood and UI improvements!
- Removed deprecated legacy dataset management endpoints; use new endpoints for that functionality.
- Changed `--config-file` globbing: single‑level wildcards (`*`) no longer match files across directory boundaries; require recursive wildcard (`**`).
- Cost tracking and cost‑based rate limiting
- Namespaces for multiple granular A/B experiments on the same TensorZero function
- Improved reasoning support for Anthropic, Fireworks AI, SGLang, Together AI
Full changelog
[!CAUTION]
Breaking Changes
- The
--config-fileglobbing behavior has changed: single-level wildcards (*) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (**). This aligns the behavior with standard glob semantics. For example:
--config-file *.tomlmatchestensorzero.toml, but notsubdir/tensorzero.toml.--config-file **/*.tomlmatches bothtensorzero.tomlandsubdir/tensorzero.toml.
[!WARNING]
Completed Deprecations
- Removed deprecated legacy endpoints for dataset management. The functionality is fully covered by the new endpoints.
New Features
- Add cost tracking and cost-based rate limiting.
- Add namespaces: the ability to set up multiple granular experiments (A/B tests) for the same TensorZero function.
- Improve reasoning support for Anthropic (including adaptive thinking), Fireworks AI, SGLang, and Together AI.
- Allow users to whitelist automatic tool approvals for TensorZero Autopilot.
- Report provider errors when
include_raw_responseis enabled. - Add
include_aggregated_responseto streaming inferences. When enabled, the final chunk includes an aggregated outputaggregated_responsethat combines previous chunks. - Allow users to kill ongoing evaluation runs from UI.
- Allow custom gateway bind addresses with the environment variable
TENSORZERO_GATEWAY_BIND_ADDRESS.
& multiple under-the-hood and UI improvements (thanks @Nfemz @greg80303)!
- Default value for `cache_options.enabled` changed from `write_only` to `off`.
- Support reasoning models from Groq, Mistral, and vLLM.
- Support multi-turn reasoning with Gemini and OpenAI‑compatible models.
- Support embedding models from Together AI.
Full changelog
[!CAUTION]
Breaking Changes
- The default value for
cache_options.enabledchanged fromwrite_onlytooff.
New Features
- Support reasoning models from Groq, Mistral, and vLLM.
- Support multi-turn reasoning with Gemini and OpenAI-compatible models.
- Support embedding models from Together AI.
- Add configurable
total_mstimeout to streaming inferences. - Display charts with top-k evaluation results in the TensorZero Autopilot UI.
- Add "Ask Autopilot" buttons throughout the UI.
- Allow TensorZero Autopilot to edit your local configuration files.
- Return
thoughtandunknowncontent blocks in the OpenAI-compatible endpoint (tensorzero_extra_content).
& multiple under-the-hood and UI improvements!
- `beta_structured_outputs` configuration field is deprecated and ignored; will be removed in a future release.
- YOLO Mode for TensorZero Autopilot
- Interruption feature for TensorZero Autopilot sessions
- Summary row added to TensorZero Autopilot session table
Full changelog
[!WARNING]
Planned Deprecations
- Anthropic's structured output feature is out of beta, so the TensorZero configuration field
beta_structured_outputsis now ignored and deprecated. It'll be removed in a future release.
Bug Fixes
- Fix a regression in the
aws_bedrockprovider that affected long-term bearer API keys. - Fix a horizontal overflow issue for tool calls and results in the inference detail UI page.
New Features
- Add YOLO Mode for TensorZero Autopilot.
- Add interruption feature for TensorZero Autopilot sessions.
- Add summary to the TensorZero Autopilot session table in the UI.
& multiple under-the-hood and UI improvements (thanks @pratikbuilds)!
Fixed a race condition disabling chat input in TensorZero Autopilot UI.
Full changelog
Bug Fixes
- Fix a race condition in the TensorZero Autopilot UI that could disable the chat input.
- Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations).
& multiple under-the-hood and UI improvements!
- TensorZero Autopilot (preview) – automated AI engineer for LLM observability, prompt/model optimization, eval setup, and A/B testing
- Support multi-turn reasoning for xAI via `reasoning_content`
Full changelog
New Features
- [Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more → Join the waitlist →
- Support multi-turn reasoning for xAI (
reasoning_contentonly).
& multiple under-the-hood and UI improvements!
- When using `unstable_error_json` with the OpenAI‑compatible inference endpoint, replace `error_json` with `tensorzero_error_json`. Both fields are currently emitted with identical data; future releases will remove `error_json`.
- OpenAI-compatible endpoints return errors in the standard OpenAI format (`{"error": {"message": "..."}}`) instead of the previous TensorZero format (`{"error": "..."}`).
- Native support for provider tools (e.g., web search) added to Anthropic and GCP Vertex AI Anthropic model providers.
- Improved streaming handling of reasoning content blocks in OpenAI Responses API.
- Graceful handling of missing `usage` fields during inference with the OpenAI model provider.
Full changelog
[!CAUTION]
Breaking Changes
- Moving forward, TensorZero will use the OpenAI API's error format (
{"error": {"message": "Bad!"}) instead of TensorZero's error format ({"error": "Bad!"}) in the OpenAI-compatible endpoints.
[!WARNING]
Planned Deprecations
- When using
unstable_error_jsonwith the OpenAI-compatible inference endpoint, usetensorzero_error_jsoninstead oferror_json. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected.
New Features
- Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use
extra_bodyto handle these tools. - Improve handling of reasoning content blocks when streaming with the OpenAI Responses API.
- Handle inferences with missing
usagefields gracefully in the OpenAI model provider. - Improve error handling across the UI.
& multiple under-the-hood and UI improvements!
- Migrate `include_original_response` to `include_raw_response` in all SDK configurations.
- Update AWS model provider settings: replace `allow_auto_detect_region = true` with `region = "sdk"`.
- When configuring custom Anthropic providers, set `api_base` to the base URL without the trailing endpoint (e.g., remove `/messages`).
- Normalized `usage` reporting: `input_tokens` and `output_tokens` now include all provider token variations (caching, reasoning, etc.), while cached tokens remain excluded. Raw provider usage can be accessed via `include_raw_usage`.
- Deprecation of `include_original_response`; migrate to `include_raw_response` for full model inference metadata.
- Deprecation of `allow_auto_detect_region = true`; migrate to `region = "sdk"` when configuring AWS model providers.
- Improved error handling across TensorZero UI, JSON deserialization, AWS providers, streaming inferences, and timeouts.
- Support for Valkey (Redis) to enhance rate‑limiting performance at ≥100 QPS.
- Added `reasoning_effort` support for Gemini 3 models (mapped to `thinkingLevel`).
Full changelog
[!CAUTION]
Breaking Changes
- TensorZero will normalize the reported
usagefrom different model providers. Moving forward,input_tokensandoutput_tokensinclude all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers withinclude_raw_usage.
[!WARNING]
Planned Deprecations
- Migrate
include_original_responsetoinclude_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata.- Migrate
allow_auto_detect_region = truetoregion = "sdk"when configuring AWS model providers. The behavior is identical.- Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:
- Before:
api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages"- Now:
api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"
Bug Fixes
- Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
- Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.
New Features
- Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
- Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
- Support
reasoning_effortfor Gemini 3 models (mapped tothinkingLevel). - Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward,
json_mode = "strict"will use the beta structured outputs feature;json_mode = "on"still uses the legacy assistant message prefill. - Improve handling of reasoning content in the OpenRouter and xAI model providers.
- Add
extra_headerssupport for embedding models. (thanks @jonaylor89!) - Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.
& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!
- Append to arrays using `/my_array/-` with `extra_body`.
- Handle cross‑model thought signatures in GCP Vertex AI Gemini and Google AI Studio.
Full changelog
New Features
- Support appending to arrays with
extra_bodyusing the/my_array/-notation. - Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.
& multiple under-the-hood and UI improvements (thanks @ecalifornica!)
- Deprecation warning: In a future release, `model` will become required in DICLOptimizationConfig initialization (currently optional with default openai::gpt-5-mini).
- Support stream_options.include_usage for every model under the Azure provider
Full changelog
[!WARNING]
Planned Deprecations
- In a future release, the parameter
modelwill be required when initializingDICLOptimizationConfig. The parameter remains optional (defaults toopenai::gpt-5-mini) in the meantime.
Bug Fixes
- Stop buffering
raw_usagewhen streaming with the OpenAI-compatible inference endpoint; instead, emitraw_usageas soon as possible, just like in the native endpoint. - Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.
New Features
- Support
stream_options.include_usagefor every model under the Azure provider.
& multiple under-the-hood and UI improvements!
- Update monitoring dashboards and alerts to use the new histogram buckets for `tensorzero_inference_latency_overhead_seconds`.
- Replace usage of deprecated environment variable `TENSORZERO_CLICKHOUSE_URL` with gateway‑mediated queries.
- Adjust configuration: migrate from `tensorzero_inference_latency_overhead_seconds_histogram_buckets` to `tensorzero_inference_latency_overhead_seconds_buckets`.
- Metric `tensorzero_inference_latency_overhead_seconds` changed from a summary to a histogram (default buckets: 1ms, 10ms, 100ms).
- Deprecation of environment variable `TENSORZERO_CLICKHOUSE_URL` in the UI.
- Renamed Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` to `tensorzero_inference_latency_overhead_seconds` (both emitted temporarily).
- Optional `include_raw_usage` parameter in inference requests returns raw usage objects alongside normalized usage.
- Optional `--bind-address` CLI flag added to the gateway.
- Optional `description` field for metrics in configuration.
Full changelog
[!CAUTION]
Breaking Changes
- The Prometheus metric
tensorzero_inference_latency_overhead_secondswill report a histogram instead of a summary. You can customize the buckets usinggateway.metrics.tensorzero_inference_latency_overhead_seconds_bucketsin the configuration (default: 1ms, 10ms, 100ms).
[!WARNING]
Planned Deprecations
- Deprecate the
TENSORZERO_CLICKHOUSE_URLenvironment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse.- Rename the Prometheus metric
tensorzero_inference_latency_overhead_seconds_histogramtotensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now.- Rename the configuration field
tensorzero_inference_latency_overhead_seconds_histogram_bucketstotensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.
New Features
- Add optional
include_raw_usageparameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalizedusageresponse field. - Add optional
--bind-addressCLI flag to the gateway. - Add optional
descriptionfield to metrics in the configuration. - Add option to fine-tune Fireworks models without automatic deployment.
& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!
- Removed `credential_location` from `DICLOptimizationConfig`.
- Moved `account_id` to `[provider_types.fireworks.sft]` and removed `api_base` and `credential_location` from `FireworksSFTConfig`.
- Moved `bucket_name`, `bucket_path_prefix`, `kms_key_name`, `project_id`, `region`, and `service_account` to `[provider_types.gcp_vertex_gemini.sft]` and removed them from `GCPVertexGeminiSFTConfig`.
- Gateway relay support for routing LLM inference requests through multiple TensorZero Gateway deployments.
- Added "Try with model" button to datapoint page in the UI.
- Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` for meta‑observability.
Full changelog
[!CAUTION]
Breaking Changes
- Migrated the following optimization fields from the TensorZero Python SDK to the configuration:
DICLOptimizationConfig: removedcredential_location.FireworksSFTConfig: movedaccount_idto[provider_types.fireworks.sft]; removedapi_baseandcredential_location.GCPVertexGeminiSFTConfig: movedbucket_name,bucket_path_prefix,kms_key_name,project_id,region, andservice_accountto to[provider_types.gcp_vertex_gemini.sft].OpenAIRFTConfig: removedapi_baseandcredential_location.OpenAISFTConfig: removedapi_baseandcredential_location.TogetherSFTConfig:hf_api_token,wandb_api_key,wandb_base_url, andwandb_project_namemoved to[provider_types.together.sft]; removedapi_baseandcredential_location.
New Features
- Support gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features.
- Add "Try with model" button to the datapoint page in the UI.
- Add
tensorzero_inference_latency_overhead_seconds_histogramPrometheus metric for meta-observability. - Add
concurrencyparameter toexperimental_render_samples(defaults to 100). - Add
otlp_traces_extra_attributesandotlp_traces_extra_resourcesto the TensorZero Python SDK. (thanks @jinnovation!)
& multiple under-the-hood and UI improvements (thanks @ecalifornica)
- The `experimental_chain_of_thought` variant type will be deprecated in version 2026.2+; migrate to native reasoning capabilities.
- The `timeout_s` configuration field for best/mixture-of-N variants will be deprecated in version 2026.2+; use the `[timeouts]` block instead.
- UI dataset builder supports complex queries (filter by tags, feedback)
- Export Prometheus metric tensorzero_inference_latency_overhead_seconds
- CLI flag --disable-api-key to disable TensorZero API keys
Full changelog
[!WARNING]
Planned Deprecations
- The variant type
experimental_chain_of_thoughtwill be deprecated in2026.2+. As reasoning models are becoming prevalent, please use their native reasoning capabilities.- The
timeout_sconfiguration field for best/mixture-of-N variants will be deprecated in2026.2+. Please use the[timeouts]block in the configuration for their candidates instead.
New Features
- Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback).
- Export
tensorzero_inference_latency_overhead_secondsPrometheus metric for meta-observability. - Allow users to disable TensorZero API keys using
--disable-api-keyin the CLI. (thanks @jinnovation!)
& multiple under-the-hood and UI improvements (thanks @ecalifornica)!
- Performance improvement for inference and datapoint list pages in the UI
- Support filtering inferences by presence of a demonstration
Full changelog
Bug Fixes
- Fix a bug where negative tag filters (e.g.
user_id != 1) matched inferences and datapoints without that tag. - Fix a bug where metric filters covering default values (e.g.
exact_match = false) matched inferences without that metric. - Fix a regression affecting the logger in the UI.
New Features
- Improve the performance of the inference and datapoint list pages in the UI.
- Support filtering inferences by whether they have a demonstration.
& multiple under-the-hood and UI improvements (thanks @jinnovation @ecalifornica @simeonlee)!
- Customizable log level via TENSORZERO_UI_LOG_LEVEL
Full changelog
Bug Fixes
- Fix a performance regression affecting the inference table in the UI.
New Features
- Allow users to customize the log level in the UI (
TENSORZERO_UI_LOG_LEVEL).
& multiple under-the-hood and UI improvements
Fixed regression that broke the dataset builder in the UI.
Full changelog
Bug Fixes
- Fixed a regression that broke the dataset builder in the UI.
& multiple under-the-hood and UI improvements
- Environment variables `TENSORZERO_UI_CONFIG_PATH` and `TENSORZERO_UI_DEFAULT_CONFIG` are deprecated and ignored.
- `model_provider_name` is still accepted in the API but will be removed in a future release; migrate to using `model_name` and `provider_name`.
- Unknown content blocks now return `model_name` and `provider_name` instead of fully-qualified `model_provider_name`.
- Free‑form search and filtering in inference and datapoint tables
- Create, edit, clone datapoints directly from the UI
- Peek at inferences on episode detail pages
Full changelog
[!CAUTION]
Breaking Changes
- Unknown content blocks now return the scope as
model_nameandprovider_nameinstead of the fully-qualifiedmodel_provider_name.
[!WARNING]
Planned Deprecations
- The TensorZero UI now reads the configuration from the gateway (instead of reading directly from the filesystem). The environment variables
TENSORZERO_UI_CONFIG_PATHandTENSORZERO_UI_DEFAULT_CONFIGare deprecated and ignored. You no longer need to mount the configuration onto the UI container.- Use
model_nameandprovider_nameto scope provider tools (e.g. OpenAI Responses API web search) instead ofmodel_provider_name. The deprecated name is still accepted in the API.
Bug Fixes
- Fix a regression in the "Try with..." modal in the UI that disregarded some parameters (e.g.
allowed_tools). - Fix a regression in
allowed_toolswhen using custom display names for tools. - Fix an edge case when using both
allowed_toolsandtool_choiceparameters with GCP Vertex AI Gemini.
New Features
- Support free-form search and filtering (e.g. by tags, metrics) the inference and datapoint tables in the UI.
- Support creating datapoints from scratch in the UI.
- Support editing TensorZero API key descriptions in the UI (thanks @nicoestrada!).
- Support editing any kind of datapoint input and output in the UI.
- Support peeking at inferences in the episode detail page in the UI (thanks @BrianLi23!).
- Support cloning datapoints in the UI.
- Optimize the rendering performance of the code editor in the UI.
- Make
mime_typeoptional for base64 file inputs (now inferred from magic bytes when not provided).
& multiple under-the-hood and UI improvements
- Programmatic evaluations on specific datapoints via `datapoint_ids`
- Generation of `values.schema.json` for the Helm chart
Full changelog
Bug Fixes
- Handle a regression in ClickHouse
latestthat affected the endpoint for deleting datapoints.
New Features
- Support running evaluations programmatically on specific datapoints (
datapoint_ids). - Generate
values.schema.jsonfor the Helm chart. (thanks @Erin-Boehmer!)
- Rename `json_mode="implicit_tool"` to `json_mode="tool"`.
- Use `model_name` (and optionally `provider_name`) instead of `model_provider_name` in `extra_body` and `extra_headers` objects supplied at inference time; scope filters are optional.
- Explicit `tensorzero::params` take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint.
- Native support for Anthropic's Beta Structured Outputs (`beta_structured_outputs`) without needing `extra_headers`
- `json_mode="tool"` now supported in chat inferences even when no tools are included
- Thought signatures added for GCP Vertex model providers
Full changelog
[!CAUTION]
Breaking Changes
- Moving forward, explicit
tensorzero::paramswill take precedence over conflicting native parameters when using the OpenAI-compatible inference endpoint.
[!WARNING]
Planned Deprecations
- Rename
json_mode="implicit_tool"tojson_mode="tool".- Set
model_nameand optionallyprovider_nameinstead ofmodel_provider_nameinextra_bodyandextra_headersobjects supplied at inference time. Alternatively, don't include a scope filter at all.
New Features
- Support Anthropic's Beta Structured Outputs feature natively (
beta_structured_outputs).extra_headersis no longer necessary. - Support
json_mode="tool"in chat inferences that don't otherwise include tools. - Support
extra_bodyandextra_headerssupplied at inference time without scope filters. - Support
extra_bodyandextra_headerssupplied at inference time withmodel_nameand optionalprovider_namescope filters. - Support thought signatures for the GCP Vertex model providers.
- Support custom tools for the OpenAI model provider.
- Add
descriptionfields to evaluation and evaluator configuration.
& multiple under-the-hood and UI improvements
- Replace `page_size` with `limit` in observability methods.
- Place fields previously nested in `metadata` or `tool_params` at the root when calling PATCH /v1/datasets/{dataset_name}/datapoints or update_datapoints.
- Deprecation warning: use `limit` instead of `page_size` for programmatic observability methods (will be removed in a future release).
- Require `allowed_tools` to include any dynamically specified tools; previously assumed always allowed.
- Adaptive stopping for evaluations in UI and Python SDK
- Support explicit `candidate_variants` and `fallback_variants` with uniform sampling
- Add `input_audio` content block support across multiple model providers
Full changelog
[!CAUTION]
Breaking Changes
- Moving forward,
allowed_toolsmust include dynamic tools (tools specified at inference time rather than in configuration). This matches the OpenAI API behavior. Previously, TensorZero assumed that dynamic tools were always allowed.
[!WARNING]
Planned Deprecations
- Use
limitinstead ofpage_sizewith the programmatic observability methods. Previously, the methods mixed these two fields.- Don't nest fields in
metadataortool_paramswhen callingPATCH /v1/datasets/{dataset_name}/datapointsorupdate_datapoints. Moving forward, please place them in the root.
[!WARNING]
Completed Deprecations
- Require
template_filesystem_access.base_pathwhentemplate_filesystem_access.enabledis true.- Removed many deprecated experimental types and methods from the TensorZero Python SDK.
New Features
- Add adaptive stopping for evaluations in the UI and Python SDK.
- Support explicit
candidate_variantsandfallback_variantswhen using uniform sampling. - Support the
input_audiocontent block in the OpenAI-compatible inference endpoint. - Support the
input_audiocontent block in the OpenAI, Azure, GCP Vertex Gemini, Google AI Studio, and OpenRouter model providers. - Add optional
filenamefield for input files. - Move closer to parity between the GCP Vertex Anthropic model provider and the Anthropic model provider.
- Expose new observability and dataset management endpoints as methods in the TensorZero Python SDK.
- Add optional
postgres.enabledfield to the configuration. - Handle missing usage information from model providers that don't report it.
- Add experimental method for searching inferences programmatically (
search_query_experimental). - Add a native OpenRouter embedding model provider.
& multiple under-the-hood and UI improvements
Fixed handling of user‑defined tags in batch inference.
Full changelog
Bug Fixes
- Enable TLS support for Postgres connections.
- Fix handling of user-defined tags in batch inference.
& multiple under-the-hood and UI improvements
- Gateway attempts `fallback_variants` in order rather than randomly sampling them.
- Tag inference and feedback with `tensorzero::api_key_public_id` when using auth.
- Add POST /v1/datasets/{dataset_name}/datapoints endpoint for creating datapoints.
- Introduce `gateway.global_outbound_http_timeout_ms` configuration setting.
Full changelog
[!CAUTION]
Breaking Changes
- Moving forward, the gateway will attempt any
fallback_variantsin order rather than randomly sample them.
Bug Fixes
- Fix a bug that prevented some model inferences from being rendered correctly in the UI.
- Handle non-image base64 file inputs consistently in the OpenAI-compatible inference endpoint.
- Handle
raw_responsecorrectly for batch inference with GCP Vertex AI Gemini.
New Features
- Apply the
tensorzero::api_key_public_idtag to inference and feedback when using auth. - Add updated HTTP endpoint for creating datapoints (
POST /v1/datasets/{dataset_name}/datapoints). - Add
gateway.global_outbound_http_timeout_msconfiguration setting.
& multiple under-the-hood and UI improvements (thanks @omarraf!)
- Rate limiting by API key (`api_key_public_id`)
- Native `service_tier` parameter for supported providers (Anthropic, Azure, Groq, OpenAI) – removes need for `extra_body`
- Native `detail` parameter for input images in Azure, OpenAI, xAI – removes need for `extra_body`
Full changelog
Bug Fixes
- Fix a regression that prevented batch inferences from being rendered in the UI.
- Handle missing Postgres credentials gracefully in the UI.
New Features
- Support rate limiting by API key (
api_key_public_id). - Add native
service_tierinference parameter (supported providers: Anthropic, Azure, Groq, OpenAI).extra_bodyis no longer necessary. - Add native
detailparameter for input images (supported providers: Azure, OpenAI, xAI).extra_bodyis no longer necessary. - Add updated HTTP endpoint for querying inferences by ID (
POST /v1/inferences/get_inferences). - Add updated HTTP endpoint for querying inferences with filters (
POST /v1/inferences/list_inferences).
& multiple under-the-hood and UI improvements
- Update configuration: replace `enable_template_filesystem_access` with `template_filesystem_access.enabled`.
- Removed configuration field `enable_template_filesystem_access`; use `template_filesystem_access.enabled` instead.
- Automated experimentation (automated A/B testing)
- Authentication for TensorZero Gateway with virtual API keys
- Native inference parameters: reasoning_effort/thinking_budget_tokens and verbosity
Full changelog
[!WARNING]
Completed Deprecations
- Completed the planned deprecation of the configuration field
enable_template_filesystem_accessin favor oftemplate_filesystem_access.enabled.
Bug Fixes
- Handle the
globalregion correctly for GCP Vertex Anthropic. - Fix
outputformat for JSON functions in the new endpoint for updating datapoints (PATCH /v1/{dataset_name}/datapoints). Theoutputfield now matches the inference endpoint (an object with arawfield;parsedis ignored and recomputed internally).
New Features
- Add automated experimentation feature (automated A/B testing). Docs
- Add authentication for the TensorZero Gateway (virtual API keys). Docs
- Add native inference parameters to enable reasoning for every supported model provider (
reasoning_effortorthinking_budget_tokensdepending on the provider).extra_bodyis no longer necessary. - Add native
verbosityinference parameter.extra_bodyis no longer necessary. - Support token inputs in the embeddings endpoint.
- Support input thought content blocks for GCP Vertex Anthropic.
- Improve handling of JSON Schemas for GCP Vertex Gemini and Google AI Studio.
& multiple under-the-hood and UI improvements
- Upgrade from the yanked `2025.10.8` release to this version.
- Migrate any code using legacy `list_datapoints` or `experimental_list_inferences` to handle new content‑block format.
- Update Helm chart values: remove `createLegacyIngress`; use only `tensorzero-gateway` ingress.
- Removed `list_datapoints` and `experimental_list_inferences` API signatures; updated data schema to use structured content blocks (`{"type": "text", ...}`, `{"type": "template", ...}`, `{"type": "file", ...}`).
- Helm chart variable `createLegacyIngress` removed; legacy gateway ingress no longer supported.
- Added HTTP endpoints for datapoint CRUD operations (`GET`, `POST`, `PATCH`, `DELETE`).
- UI support to create, update, and delete messages and content blocks in dataset editor.
- Emit OpenTelemetry spans for rate‑limiting queries.
Full changelog
[!CAUTION]
Notice on2025.10.8: We ran into a technical issue during the release process for2025.10.8that resulted in a broken build for the TensorZero Python SDK on PyPI. We've yanked that release and recommend upgrading to this version.
[!CAUTION]
Breaking Changes
- This release includes small breaking changes to the programmatic observability/dataset APIs (e.g.
list_datapoints,experimental_list_inferences) and the underlying data schema. Moving forward, TensorZero will store and return the new format for text ({"type": "text", "text": "..."}), template ({"type": "template", "name": "...", "arguments": { ... }}), and file ({"type": "file", "file_type": "...", ...}) content blocks. Note: These changes do not affect the inference APIs or the legacy data stored in ClickHouse.
[!WARNING]
Completed Deprecations
- The TensorZero Helm chart will no longer support the legacy gateway ingress. The
createLegacyIngressvariable was removed. Moving forward, the only supported gateway ingress istensorzero-gateway.
Bug Fixes
- Fix an issue that prevented comments from being rendered in the workflow evaluation UI.
New Features
- Add HTTP endpoint for querying datapoints by ID (
POST /v1/datasets/get_datapoints). - Add HTTP endpoint for querying datapoints with filters (
POST /v1/datasets/{dataset_name}/list_datapoints). - Add HTTP endpoint for creating datapoints from inferences (
POST /v1/datasets/{dataset_id}/from_inferences). - Add HTTP endpoint for updating datapoints (
PATCH /v1/{dataset_name}/datapoints). - Add HTTP endpoint for updating datapoint metadata (
PATCH /v1/datasets/{dataset_name}/datapoints/metadata). - Add HTTP endpoint for deleting datapoints (
DELETE /v1/datasets/{dataset_id}/datapoints). - Add HTTP endpoint for deleting datasets (
DELETE /v1/datasets/{dataset_id}). - Enable users to create, update, and delete messages and content blocks in the dataset editor in the UI.
- Emit OpenTelemetry spans for rate limiting queries.
- Add support for deployment service accounts in the Helm chart (thanks @jinnovation!).
- Add support for dynamic extra attributes for OTLP spans (
TensorZero-OTLP-Traces-Extra-Attribute-*).
& multiple under-the-hood and UI improvements
- Deprecation warning: Untagged enums for file content blocks will be removed after 2026.2+; migrate to tagged enums with `file_type` (`url`, `base64`, or `object_storage`).
- Deprecation warning: Type `InferenceFilterTreeNode` in TensorZero Python SDK will be renamed to `InferenceFilter`; both aliases available until 2026.2+.
- Default value for `fetch_and_encode_input_files_before_inference` changed from true to false, altering when input files are fetched relative to inference.
- Batch datapoint updates via PATCH /v1/{dataset_name}/datapoints
- Thought summaries exposed in TensorZero Python SDK
- Additional semantic tags added for OpenInference trace exports
Full changelog
[!CAUTION]
Breaking Changes
- The default value for
fetch_and_encode_input_files_before_inferenceis changing fromtruetofalse. As a result, the gateway will no longer fetch input files before inference, but instead will fetch them in parallel with inference (for observability). In rare cases, this may cause the gateway to receive different input files than those received by model providers.
[!WARNING]
Planned Deprecations
- Migrate file content blocks from untagged enums to tagged enums. Moving forward, you should provide a field
file_typewith a value of"url","base64", or"object_storage". Untagged enums are still accepted for backwards compatibility but will be deprecated in2026.2+.- Rename the TensorZero Python SDK type
InferenceFilterTreeNodetoInferenceFilterfor consistency with related types. Both types will be available as aliases until2026.2+.
Bug Fixes
- Send a user agent when fetching input files to avoid restrictions from websites that require it (e.g. Wikimedia).
New Features
- Add a new endpoint for batch datapoint updates (
PATCH /v1/{dataset_name}/datapoints). - Expose thought summaries in the TensorZero Python SDK.
- Add additional semantic tags when exporting traces using the OpenInference format (thanks @jinnovation!)
& multiple under-the-hood and UI improvements
- Update configuration: change `type = "static"` to `type = "inference"` and `type = "dynamic"` to `type = "workflow"`. Both old values remain accepted until tensorzero 2026.2+.
- Renaming configuration field `type = "static"` to `type = "inference"` and `type = "dynamic"` to `type = "workflow"`. Both old names will be supported until version 2026.2+.
- Short-hand model names for OpenAI Responses API (e.g., openai::responses::gpt-5)
- Dynamic provider tools supporting web search via OpenAI Responses API
- Custom `api_base` support for Anthropic model provider
Full changelog
[!WARNING]
Planned Deprecations
- We're renaming "static evaluations" to "inference evaluations" and "dynamic evaluations" to "workflow evaluations". The only action needed is to update
type = "static"in the configuration totype = "inference". Both versions will be supported until2026.2+.
Bug Fixes
- Fix a bug that dropped tool IDs in output
tool_callcontent blocks when updating datapoints. - Prefer magic bytes over the
Content-TypeHTTP response header to infer MIME types of input files.
New Features
- Support short-hand model names for the OpenAI Responses API (e.g.
openai::responses::gpt-5). - Support dynamic provider tools (e.g. web search with the OpenAI Responses API).
- Support custom
api_basefor the Anthropic model provider.
& multiple under-the-hood and UI improvements
- FinishReason.STOP_SEQUENCE enum value added to TensorZero Python SDK
Changelog
Bug Fixes
- Add
FinishReason.STOP_SEQUENCEto the TensorZero Python SDK.
- Deprecation: `bulk_insert_datapoints` endpoint will be renamed to `create_datapoints`; both available until 2026.2+.
- Python SDK type renames: `*InferenceDataset` → `*InferenceDatapoint`, `*Node` → `*Filter`.
- Legacy inference input formats are no longer accepted (were deprecated previously).
- Support OpenAI Responses API
- Structured generation (strict JSON) on Groq model provider
- File URLs as inputs for Anthropic model provider
Full changelog
[!WARNING]
Planned Deprecations
- The
bulk_insert_datapointsmethod (POST /datasets/{dataset_name}/datapoints/bulk) will be renamed tocreate_datapoints(POST /datasets/{dataset_name}/datapoints). Both methods will be available until2026.2+. (thanks @BrianLi23!)
[!WARNING]
Completed Deprecations
Concluded many small ongoing deprecations:
- Python SDK: renamed the types
*InferenceDataset→*InferenceDatapointand*Node→*Filter- Inference: stop accepting legacy input formats (e.g. inline arguments for templates). These legacy formats have issued deprecation warnings for the last several months.
- Dynamic Evaluations: renamed the variable
datapoint_id→task_id
Bug Fixes
- Improve the rendering performance of the code editor in the UI.
- Fixed the
X_per_monthrate limit to cover a calendar month rather than 30 days. - Use
max_completion_tokensrather thanmax_tokensin the Azure OpenAI Service model provider.
New Features
- Support the OpenAI Responses API.
- Support structured generation (strict JSON mode) on the Groq model provider.
- Support inputs with file URLs on the Anthropic model provider.
- Support encrypted reasoning and thought summaries on the OpenAI model provider.
- Support dynamic OTLP resources when exporting OpenTelemetry traces (
tensorzero-otlp-traces-extra-resource-*). - Support fallbacks for dynamic credentials (e.g.
api_key_location = { default = "dynamic::foo", fallback = "env::bar" }). - Improve the handling for stale datapoints in the UI.
& multiple under-the-hood and UI improvements
Fixed Playground UI failures for inferences using static tools with custom names.
Full changelog
Bug Fixes
- Fix bug in the Playground UI that caused inferences containing static tools with custom names (
tools.my_tool.name) to fail.
- Explicitly list dynamic tools in the allowed‑tools configuration before the upcoming release.
- Update any scripts or configs using `datapoint_name` to use `task_name`.
- If relying on the default `--config-file` flag, add it manually when building/pulling `tensorzero/gateway` images.
- Dynamic tools will no longer be automatically included in the allowed list; explicit allowance required.
- Renamed configuration key `datapoint_name` to `task_name` for dynamic evaluations.
- Removed default inclusion of `--config-file` flag from `tensorzero/gateway` Dockerfile.
- Custom granular rate limits for users
- Dynamic and static OTLP header support (Python SDK and config)
- Optional `max_distance` field for `experimental_dynamic_in_context_learning` variants
Full changelog
[!WARNING]
Planned Deprecations
- Currently, the gateway automatically includes all dynamic tools in the list of allowed tools. In a near-future release, dynamic tools will no longer be included automatically. If you intend for your dynamic tools to be allowed, please allow them explicitly.
[!WARNING]
Completed Deprecations
- Finish renaming
datapoint_name→task_namefor dynamic evaluations.- Stop including
--config-filein theDockerfilefortensorzero/gatewayby default.- Use the
TENSORZERO_CLICKHOUSE_URLenvironment variable instead ofCLICKHOUSE_URL.- Remove deprecated features from the OpenAI-compatible inference API.
Bug Fixes
- Handle
json_modecorrectly inexperimental_best_of_nvariants.
New Features
- Allow users to define and enforce custom granular rate limits.
- Update the UI to handle unlimited named templates and schemas.
- Support dynamic OTLP headers in the Python SDK.
- Support static OTLP headers in the configuration.
- Add optional
max_distanceconfiguration field forexperimental_dynamic_in_context_learningvariants. - Improve fallback behavior for
experimental_dynamic_in_context_learningvariants. - Allow Google AI Studio Gemini to accept input files beyond images.
- Add
nameto datapoints. - Add
experimental_run_evaluationto the Python SDK. - Allow users to configure default credentials by provider type.
- Support supervised fine-tuning for Together AI models in the UI.
& multiple under-the-hood and UI improvements (thanks @dangvu0502!)
- Increased default body limit to 100 MB for patch_openai_client
Full changelog
New Features
- Increase default body limit to 100MB for
patch_openai_client.
& multiple under-the-hood and UI improvements
- Deprecation warning: replace `timeouts.non_streaming.total_ms` with `timeout_ms` for embedding model timeouts; removal planned in 2026.1+.
- Deprecation warning: use CLI flags `--run-clickhouse-migrations` and `--run-postgres-migrations` instead of `--run-migrations-only`; removal planned in 2026.1+.
- Deprecation warning: Prometheus metrics `request_count` and `inference_count` will be removed; use `tensorzero_requests_total` and `tensorzero_inferences_total`.
- UI support for adding, editing, and deleting tags for datapoints
- UI support for adding, editing, and deleting `system` entries for datapoints
- Configuration flag `gateway.fetch_and_encode_input_files_before_inference` with default true
Full changelog
[!WARNING]
Planned Deprecations
- Configure timeouts for embedding models and embedding model providers with
timeout_msinstead oftimeouts.non_streaming.total_ms. The latter will be removed in a future release (2026.1+).- Use the gateway CLI flags
--run-clickhouse-migrationsand--run-postgres-migrationsinstead of--run-migrations-only.--run-migrations-onlyrequires credentials for both databases, even though Postgres is an optional dependency, so it will be removed in a future release (2026.1+).- Scrape the Prometheus metrics
tensorzero_requests_totalandtensorzero_inferences_totalinstead ofrequest_countandinference_count. The gateway will double-emit the metrics for now; the deprecated metrics will be removed in a future release (2026.1+).
Bug Fixes
- Fixed an issue that prevented static evaluations on datapoints with no reference output to be rendered in the UI.
- Fixed a regression in the gateway's internal HTTP client that that triggered unnecessary warnings and deteriorated performance when handling many concurrent streaming inferences.
- Fixed an issue that prevented base64-encoded embedding requests from being cached by TensorZero.
New Features
- Allow users to add, edit, and delete tags for datapoints in the UI.
- Allow users to add, edit, and delete
systemfor datapoints in the UI. - Add the configuration setting
gateway.fetch_and_encode_input_files_before_inference. If set to true (default), the gateway will fetch remote input files and send them as a base64-encoded payload in the prompt; this is recommended to ensure that TensorZero and the model providers see identical inputs. If set to false, TensorZero will forward the input file URLs and fetch them for observability in parallel with inference. - Improved gateway errors for database issues.
& multiple under-the-hood and UI improvements
- Multiple small improvements to the evaluations UI for streamlined workflows and simplified debugging
Full changelog
Bug Fixes
- Implemented a workaround for an upstream bug in
opentelemetry-otlpthat caused our OTLP exporter to fail to send data to encrypted endpoints.
New Features
- Added multiple small improvements to the evaluations UI to streamline common workflows and simplify debugging.
& multiple under-the-hood and UI improvements
- Model observability page showing throughput and latency analytics in the UI
- Support for OpenInference format when exporting OpenTelemetry traces
- Supervised fine‑tuning (SFT) with GCP Vertex AI Gemini added to the UI
Full changelog
New Features
- Add model observability page to the UI with model throughput and latency analytics.
- Add support for OpenInference format when exporting OpenTelemetry traces.
- Expand support of UI features for the default function (e.g. "Try with model").
- Add support for supervised fine-tuning (SFT) with GCP Vertex AI Gemini in the UI.
- Improve the performance of episode table in the UI.
- Add an example of using the programmatic workflow for dynamic in-context learning.
& multiple under-the-hood and UI improvements (thanks @AnnaVernerovaHID @dangvu0502 @jinnovation!)
- Planned deprecation: rename Python SDK types from `Dicl*` to `DICL*`; both versions work now but deprecated ones will be removed in 2025.12+.
- Support unlimited prompt templates per function
- Add `append_to_existing_variants` to programmatic DICL interface
- Skip writing inference cache entries on tool call validation failure
Full changelog
[!WARNING]
Planned Deprecations
- Rename types from
Dicl*toDICL*in the Python SDK for consistency. Both versions work for now, and the deprecated types will be removed in a future release (2025.12+).
Bug Fixes
- Fix a regression in the UI that prevented
chatdatapoints from being edited.
New Features
- Expand the prompt templates and schemas functionality to support unlimited templates per function.
- Support appending to existing DICL variants in the programmatic interface (
append_to_existing_variants). - Skip writing inference cache entries if tool call validation fails.
& multiple under-the-hood and UI improvements (thanks @BretHudson!)
- Dynamic OTLP header support for OpenTelemetry trace export
- `allowed_tools` field added to OpenAI-compatible inference endpoint
- Automatic HTTP/2 connection adjustment based on concurrency
Full changelog
New Features
- Add support for dynamic OTLP headers when exporting OpenTelemetry traces.
- Add support for
allowed_toolsfield in the OpenAI-compatible inference endpoint. - Improve performance by automatically adjusting the number of HTTP2 connections to model providers based on concurrency.
& multiple under-the-hood and UI improvements (thanks @yuria-loo!)
- Programmatic API for reinforcement fine-tuning (RFT) with OpenAI
- Defaults added for individual fields in the `retries` configuration
- Dynamic specification of Azure provider endpoint
Full changelog
Bug Fixes
- Fix a regression that prevented rendering of inferences with
thoughtcontent blocks in the UI. - Stop logging HTTP requests and responses twice in debug mode.
New Features
- Add a programmatic API for reinforcement fine-tuning (RFT) with OpenAI.
- Provide defaults for individual fields in the
retriesconfiguration. - Allow users to specify the Azure provider endpoint dynamically. (thanks @Dineshm-coder!)
- Improve error messages when the gateway is missing credentials.
& multiple under-the-hood and UI improvements (thanks @JoshuaTanaka @HJStaiff!)
- The `feedback_id` field in the TensorZero Python SDK is no longer incorrectly doubly nested, aligning with type annotations.
- Throughput chart added to function detail page in TensorZero UI
- Export OpenTelemetry spans for feedback endpoint
- Recipes for supervised fine-tuning with `torchtune` and `axolotl`
Full changelog
[!CAUTION]
Breaking Changes
- The bug fix for
feedback_idtechnically introduces a breaking change in the TensorZero Python SDK. The field is no longer incorrectly doubly nested and now matches the SDK's type annotations.
[!WARNING]
Completed Deprecations
json_modeis now required for JSON function variants.
Bug Fixes
- Added workarounds for two ClickHouse regressions (ClickHouse/ClickHouse#86415, ClickHouse/ClickHouse#86557) introduced in ClickHouse
25.8. Replicated self-hosted clusters are still affected by ClickHouse/ClickHouse#86434. Pin to25.7or earlier if you run a replicated cluster. Single-node self-hosted deployments and ClickHouse Cloud are not affected. - Fixed a bug in the TensorZero Python SDK that caused
feedback_idto be doubly nested in feedback responses. - Fixed a logging issue where models were incorrectly reported as "not found" in the embedding endpoint even on success.
- Fixed a bug where pending insertions could be dropped during shutdown when
gateway.observability.batch_writes.enabled = true. - Fixed a bug in the dynamic in-context learning (DICL) recipe and programmatic API. The gateway automatically detects problematic examples and logs a warning with resolution instructions if necessary.
New Features
- Added a throughput chart to the function detail page in the TensorZero UI.
- Support exporting OpenTelemetry spans for the feedback endpoint.
- Added recipes for supervised fine-tuning with
torchtuneandaxolotl. - Added examples for using the embedding endpoint with Azure OpenAI Service and OpenAI-compatible providers like Ollama (thanks @slbotbm!).
- Updated the DICL recipe to use TensorZero's new embedding API.
- Added support for caching embeddings (thanks @ishbir!).
& multiple under-the-hood and UI improvements (thanks @contrun @jinnovation!)
- Programmatic optimization interface for dynamic in-context learning
- Exposure of more hyperparameters for programmatic supervised fine-tuning with Together AI
Full changelog
Bug Fixes
- Reduce the ClickHouse memory footprint in large deployments with human feedback for evaluations.
New Features
- Add a programmatic optimization interface for dynamic in-context learning.
- Expose more hyperparameters for programmatic supervised fine-tuning with Together AI.
& many under-the-hood and UI improvements (thanks @quangIO!)
- Removal of support for unprefixed model names in the OpenAI‑compatible embeddings endpoint; future releases (2025.12+) will require prefix `tensorzero::embedding_model_name::`.
- Added `extra_body` field to embedding model configurations for custom API request fields.
- Updated Azure OpenAI Service provider to use API version `2025-04-01-preview`.
- Added CrewAI integration example.
Full changelog
[!WARNING]
Planned Deprecations
- The OpenAI-compatible embeddings endpoint will require the prefix
tensorzero::embedding_model_name::for model names (e.g.tensorzero::embedding_model_name::openai::text-embedding-3-small). Support for unprefixed names will be removed in a future release (2025.12+).
Bug Fixes
- Fix a ClickHouse warning that occurred when a model inference had input tokens set to null and output tokens non-null, or vice versa. This issue only caused warnings and did not affect TensorZero's user-facing functionality.
New Features
- Add
extra_bodysupport for embedding model configurations to enable custom API request fields for various embedding providers. (thanks @ishbir!) - Update the Azure OpenAI Service model provider to use API version
2025-04-01-preview. - Add CrewAI integration example.
& multiple under-the-hood and UI improvements (thanks @MengAiDev!)
- If you previously enabled batching writes to ClickHouse via the embedded Python gateway, disable that setting or switch to a standalone (HTTP) gateway to avoid deadlocks caused by GIL interactions.
- Removed support for batching writes to ClickHouse when using the embedded Python gateway; batching remains available with a standalone (HTTP) gateway.
- Configuration can be split into multiple files using glob patterns
- Example added for multimodal (vision) fine-tuning
- More hyperparameters exposed for programmatic supervised fine‑tuning with Fireworks
Full changelog
[!CAUTION]
Breaking Changes
- Temporarily removing support for batching writes to ClickHouse with the embedded gateway in Python: In the previous release, we added support for batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale (default off). Later, we discovered that in rare scenarios, the Python GIL could interfere with this setting in embedded clients and cause a deadlock. While we investigate a solution, we are removing support for batching with the embedded client to prevent technical footguns. Batching remains available when using a standalone (HTTP) gateway.
New Features
- Add support for splitting configuration into multiple files with glob patterns
- Add an example for multimodal (vision) fine-tuning
- Expose more hyperparameters for programmatic supervised fine-tuning with Fireworks
- Optimize queries in the UI to improve the performance of assorted pages in large-scale deployments
- Enable setting global labels for all created resources in Helm (thanks @jinnovation!)
- Support embedding endpoint when using the OpenAI SDK with an embedded gateway (
patch_openai_client)
& many under-the-hood and UI improvements (thanks @wliu4040!)
- Playground UI for side‑by‑side variant comparison, prompt iteration, and inference replay
- ClickHouse write batching to increase ingest throughput and lower insert overhead at scale
- Jupyter notebook recipe for supervised fine‑tuning with Unsloth
Full changelog
New Features
- Add a Playground to the UI to compare variants side-by-side, iterate on prompts quickly, and replay inference requests.
- Support batching writes to ClickHouse to boost ingest throughput and reduce insert overhead at scale.
- Add a Jupyter notebook recipe for supervised fine-tuning with Unsloth.
& many under-the-hood and UI improvements (thanks @contrun @lblack00!)
- OpenAI‑compatible endpoint for embeddings supporting OpenAI and Azure OpenAI Service providers
- Self‑hosted replicated ClickHouse database support
- Parse `reasoning_content` from Fireworks and vLLM model providers
Full changelog
New Features
- Add an OpenAI-compatible endpoint for embeddings, with support for OpenAI (& OpenAI-compatible) and Azure OpenAI Service model providers.
- Add support for self-hosted replicated ClickHouse databases.
- Parse
reasoning_contentfrom Fireworks and vLLM model providers. - Improve error messages for AWS Bedrock and AWS SageMaker model providers.
Bug Fixes
- Allow configuration to specify
descriptionfor JSON functions. - Fix a regression where function descriptions were no longer rendered in the UI.
& many under-the-hood and UI improvements (thanks @yuvraj-kumar-dev)
- gateway.observability.skip_completed_migrations config to skip ClickHouse migration workflow on startup
- Support for raw_text content blocks in OpenAI-compatible inference endpoint
- Ability to collect outputs from "Try with variant" UI as demonstrations
Full changelog
New Features
- Add
gateway.observability.skip_completed_migrationsconfiguration option to reduce gateway startup time and database load. When enabled, the gateway will skip running the ClickHouse migration workflow (i.e. verifying and potentially applying every migration) on startup for migrations that are already present in a database table that tracks migration history. - Support
raw_textcontent blocks in the OpenAI-compatible inference endpoint. (Thanks @hongantran3804 @pykm05 @pycoder49!) - Allow users to collect outputs from "Try with variant" in the UI as demonstrations.
Bug Fixes
- Fix handling of reasoning content blocks for DeepSeek-R1 on AWS Bedrock.
- Set proper default value for
max_tokensfor the Anthropic and GCP Vertex AI Anthropic model providers. The gateway will now error if no value is provided in the configuration or request and the model is unknown. - Skip caching model inferences that generated invalid tool call arguments.
& many under-the-hood and UI improvements (thanks @michaldorsett @K-coder05 @dcaputo-harmoni @masonblier @Nicolasgarbarino!)
- Added `gateway.unstable_disable_feedback_target_validation` flag for large-scale deployments
Full changelog
Experimental
- Add
gateway.unstable_disable_feedback_target_validationconfiguration option to improve the performance of the feedback endpoint in large-scale deployments (not recommended unless you know what you're doing).
& multiple under-the-hood and UI improvements (thanks @michaldorsett @HJStaiff @liamjdavis!)
- Soft deletion of datasets via UI
- Filtering by time and tags in experimental_list_inferences
- Ordering by metric value and time in experimental_list_inferences
Full changelog
Bug Fixes
- Fixed an issue with inference caching where inference requests that were identical except for their inline (base64-encoded) file data incorrectly shared the same cache key, resulting in false cache hits. The cache key now includes a hash of the inline file data, ensuring that such requests are properly distinguished.
New Features
- Added functionality for deleting datasets in the UI (soft deletion).
Experimental
- Added support for filtering by time and tags to the
experimental_list_inferencesmethod. - Added support for ordering by metric value and time to the
experimental_list_inferencesmethod.
& multiple under-the-hood and UI improvements (thanks @NamNgHH!)
- Migrate `gateway.enable_template_filesystem_access = true` to `gateway.template_filesystem_access.enabled = true`
Full changelog
[!WARNING]
Planned Deprecations
- Migrate
gateway.enable_template_filesystem_access = truetogateway.template_filesystem_access.enabled = true. We're about to add more fields toenable_template_filesystem_accessto support multi-file configuration.
Bug Fixes
- Remove a third-party dependency that was causing a memory leak in the UI.
- Fix a regression that prevented the UI from running offline.
& multiple under-the-hood and UI improvements
Fixed occasional connection errors with ClickHouse Cloud by updating the client implementation.
Full changelog
Bug Fixes
- Update TensorZero's ClickHouse client to match the parameter recommendations by ClickHouse. (This change aims to resolve occasional connection errors with ClickHouse Cloud.)
- Experimental flag `gateway.unstable_error_json` now returns internal error details in response body.
- Improve UI components for rendering text, JSON, Markdown, and MiniJinja templates (syntax highlighting, line numbers, wrapping, etc.)
- Improve performance of the UI's episode list page
- Launch SFT jobs for Together AI and GCP Vertex AI Gemini programmatically
Full changelog
New Features
- Improve UI components for rendering text, JSON, Markdown, and MiniJinja templates (syntax highlighting, line numbers, wrapping, etc.)
- Improve the performance of the UI's episode list page
- Add pseudonymous usage analytics to the gateway (see docs for details and instructions to opt out)
Experimental
- Launch SFT jobs for Together AI and GCP Vertex AI Gemini programatically
- Return internal error details in the response body (
gateway.unstable_error_json) (thanks @panesher)
& many under-the-hood and UI improvements (thanks @michaldorsett @itsrajatrai @caarlos0)
- Supervised fine‑tuning workflow now fully supports multimodal data (vision, documents) with multi‑turn tool use and TensorZero inference capabilities
- Streaming inference support added for best‑of‑n and mixture‑of‑n variant types
- Experimental Python client methods: `experimental_launch_optimization`, `experimental_poll_optimization`, `experimental_get_config` and extended `experimental_render_inferences`
Full changelog
New Features
- Revamped the UI's supervised fine-tuning workflow to fully support TensorZero's inference capabilities, including multimodal data (vision, documents, etc.), multi-turn tool use, and more.
- Added streaming inference support for best-of-n and mixture-of-n variant types.
- Optimized the performance of some database queries in the UI.
Experimental
Experimental features don't have a stable API. They may change or be removed in future releases.
- Added methods to the Python client for programmatically launching (
experimental_launch_optimization) and polling for (experimental_poll_optimization) optimization jobs. For now, these methods support supervised fine-tuning with OpenAI and Fireworks AI. - Added a method to the Python client for retrieving the configuration (
experimental_get_config). - Updated
experimental_render_inferencesto accept outputs from bothexperimental_list_inferencesandlist_datapoints.
& many under-the-hood and UI improvements (thanks @jeevikasirwani!)
- Added `delete = true` option to `extra_body` and `extra_headers` to remove built-in fields
- Introduced `gateway.base_path` configuration field to prefix all endpoints
- Added `discard_unknown_chunks` in model provider config to ignore unsupported chunk types
Full changelog
New Features
- Add
delete = trueoption toextra_bodyandextra_headersconfiguration fields to instruct the gateway to delete built-in fields from the request body or headers. - Add
gateway.base_pathfield to configuration to instruct the gateway to prefix all endpoints with this path. - Add
discard_unknown_chunksfield to model provider configuration to instruct the gateway to discard chunks with unknown or unsupported types instead of throwing an error. - Add optional
namefield to tool configuration; if provided, the tool name will be sent to the LLMs instead of the tool ID, allowing for multiple tools with the same name. - Add functionality to filter
list_datapointsby function name.
& multiple under-the-hood and UI improvements
- Granular timeouts via `[timeouts]` in variant and model configuration blocks
- Shorthand model names for Groq (`groq::...`) and OpenRouter (`openrouter::...`) providers
- Explicit `stop_sequences` inference parameter
Full changelog
New Features
- Add recipe for supervised fine-tuning with Google Vertex AI Gemini
- Add granular timeouts (
[timeouts]) to variant and model configuration blocks - Support short-hand model names for Groq (
groq::...) and OpenRouter (openrouter::...) model providers - Support tool use with vLLM (thanks @CHRV @chaet1t!)
- Add explicit
stop_sequencesinference parameter - Support dynamic credentials in OpenAI-compatible inference endpoint (
tensorzero::credentials) (thanks @zmij!) - Support multimodal inference and file inputs on AWS Bedrock
& multiple under-the-hood and UI improvements
- Return null instead of an empty string when `service_tier` is missing in the OpenAI‑compatible inference endpoint.
- During streaming inference, `raw_name` in a tool call chunk is now an empty string after the tool name has finished streaming, differing from previous behavior where it repeated the same value.
- Allow inference containing files with arbitrary MIME types
- [timeouts] section added to model provider configuration for granular timeout settings
- Support templates without schemas; built‑in variables `system_text`, `assistant_text`, and `user_text` are now available
Full changelog
[!CAUTION]
Breaking Changes
- Streaming Inference + Tool Use: During streaming inferences,
raw_namein a tool call chunk represents a delta that should be accumulated. If the tool name has finished streaming, this field will contain an empty string. Previously, TensorZero returned the sameraw_namein every subsequent chunk for that tool call. The new behavior matches the OpenAI API's behavior.
Bug Fixes
- Return
nullinstead of an empty string when missingservice_tierin the OpenAI-compatible inference endpoint
New Features
- Allow inference containing files with arbitrary MIME types
- Add
[timeouts]to model provider configuration for granular timeout functionality - Support templates without schemas; add built-in
system_text,assistant_text, anduser_texttemplate variables - Support tags in OpenAI-compatible inference endpoint (
tensorzero::tags) - Add
experimental_list_inferencesmethod to the client for retrieving historical inferences
& multiple under-the-hood and UI improvements (thanks @vr-varad!)
- Handle thinking and unknown content blocks for GCP Vertex Anthropic and Gemini models
- Added `endpoint_id` field in configuration for fine‑tuned GCP Vertex Anthropic and Gemini models
- Introduced Groq (`groq`) model provider
Full changelog
Bug Fixes
- Increase database health check timeout in the gateway to 180s to gracefully handle warmup of serverless databases
New Features
- Handle thinking and unknown content blocks for
gcp_vertex_anthropicandgcp_vertex_geminimodels - Add
endpoint_idfield in the configuration forgcp_vertex_anthropicandgcp_vertex_geminimodels to support fine-tuned models - Add a dedicated Groq (
groq) model provider (thanks @oliverbarnes!) - Support
include_original_responseduring streaming inference
& multiple under-the-hood and UI improvements