phoenix

varize-phoenix-v16.0.0 scope: arize-phoenix Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 2mo Tracing

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents ai-monitoring ai-observability aiengineering anthropic datasets

+10 more

evals langchain llamaindex llm-eval llm-evaluation llmops llms openai prompt-engineering smolagents

ReleasePort's take

Light signal

editorial:auto 2mo

Phoenix v16.0.0 introduces breaking changes for sandboxing and code evaluators while adding new composition features for evaluation strategies.

Why it matters: Patch immediately if your application uses sandboxing or code evaluators; test the new strategy‑composition API in development before upgrading to avoid runtime failures.

Summary

AI summary

Updates ⚠ BREAKING CHANGES, Bug Fixes, and 16.0.0 across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Breaking	Medium	Sandboxing and Code Evaluators introduce breaking changes in Phoenix v16.0.0. Sandboxing and Code Evaluators introduce breaking changes in Phoenix v16.0.0. Source: llm_adapter@2026-05-22 Confidence: low	—
Feature	Medium	Phoenix now supports composing evaluation strategies via Code Evaluators. Phoenix now supports composing evaluation strategies via Code Evaluators. Source: llm_adapter@2026-05-22 Confidence: high	—
Feature	Medium	Agents now enable provider-native web search / fetch when available. Agents now enable provider-native web search / fetch when available. Source: llm_adapter@2026-05-22 Confidence: high	—
Bugfix	Medium	Prevents broken tool groups in agents. Prevents broken tool groups in agents. Source: llm_adapter@2026-05-22 Confidence: high	—

Full changelog

16.0.0 (2026-05-21)

MIGRATION.md

⚠ BREAKING CHANGES

Sandboxing and Code Evaluators (#13290)

Features

Phoenix now lets you compose evaluation strategies in code.

Most eval tooling hands you a fixed menu of judge templates. Real evaluation is rarely that tidy.

Code Evaluators enable you to build evaluation criteria the way you want. You write a Python or TypeScript evaluate() function in the Phoenix UI — no SDK, no local runtime, no deploy step — and Phoenix runs it server-side, recording labels and scores as annotations on every experiment run.

Because it's just code, you control the whole strategy:

• Composite scoring: blend sub-scores (LLM judgment + deterministic rules) into one weighted metric
• Embedding-based evaluation: cosine similarity over embeddings instead of brittle string matching
• LLM juries: poll multiple models and combine verdicts into a weighted consensus

Sandboxed Code evaluators unlock the idea of agents as a judge as well. We're excited where this is heading.

agents: Enable provider native web search / fetch when available (#13333) (41eb4fc)
Sandboxing and Code Evaluators (#13290) (e294d93)

Bug Fixes

agents: Prevent broken tool groups (#13387) (78d1e96)

Breaking Changes

Sandboxing and Code Evaluators break existing agent configurations.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track phoenix

Get notified when new releases ship.

About phoenix

AI Observability & Evaluation

All releases →

Related context

Related tools

Earlier breaking changes

varize-phoenix-v19.0.0 GraphQL createUserApiKey and createSystemApiKey reject API-key callers.
varize-phoenix-v18.0.0 Changes session time-range filters to use interval-overlap semantics.
varize-phoenix-v17.0.0 Adds system settings for admin-managed assistant enablement and trace recording policy
varize-phoenix-v15.7.0 Removes v1 /chat route and associated code