Skip to content

phoenix

varize-phoenix-v16.0.0 scope: arize-phoenix Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 13d Tracing
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents ai-monitoring ai-observability aiengineering anthropic datasets
+10 more
evals langchain llamaindex llm-eval llm-evaluation llmops llms openai prompt-engineering smolagents

ReleasePort's take

Light signal
editorial:auto 12d

Phoenix v16.0.0 introduces breaking changes for sandboxing and code evaluators while adding new composition features for evaluation strategies.

Why it matters: Patch immediately if your application uses sandboxing or code evaluators; test the new strategy‑composition API in development before upgrading to avoid runtime failures.

Summary

AI summary

Updates ⚠ BREAKING CHANGES, Bug Fixes, and 16.0.0 across a mixed release.

Changes in this release

Breaking Medium

Sandboxing and Code Evaluators introduce breaking changes in Phoenix v16.0.0.

Sandboxing and Code Evaluators introduce breaking changes in Phoenix v16.0.0.

Source: llm_adapter@2026-05-22

Confidence: low

Feature Medium

Phoenix now supports composing evaluation strategies via Code Evaluators.

Phoenix now supports composing evaluation strategies via Code Evaluators.

Source: llm_adapter@2026-05-22

Confidence: high

Feature Medium

Agents now enable provider-native web search / fetch when available.

Agents now enable provider-native web search / fetch when available.

Source: llm_adapter@2026-05-22

Confidence: high

Bugfix Medium

Prevents broken tool groups in agents.

Prevents broken tool groups in agents.

Source: llm_adapter@2026-05-22

Confidence: high

Full changelog

16.0.0 (2026-05-21)

MIGRATION.md

⚠ BREAKING CHANGES

  • Sandboxing and Code Evaluators (#13290)

Features

Phoenix now lets you compose evaluation strategies in code.

Most eval tooling hands you a fixed menu of judge templates. Real evaluation is rarely that tidy.

Code Evaluators enable you to build evaluation criteria the way you want. You write a Python or TypeScript evaluate() function in the Phoenix UI — no SDK, no local runtime, no deploy step — and Phoenix runs it server-side, recording labels and scores as annotations on every experiment run.

Because it's just code, you control the whole strategy:

• Composite scoring: blend sub-scores (LLM judgment + deterministic rules) into one weighted metric
• Embedding-based evaluation: cosine similarity over embeddings instead of brittle string matching
• LLM juries: poll multiple models and combine verdicts into a weighted consensus

Sandboxed Code evaluators unlock the idea of agents as a judge as well. We're excited where this is heading.

  • agents: Enable provider native web search / fetch when available (#13333) (41eb4fc)
  • Sandboxing and Code Evaluators (#13290) (e294d93)

Bug Fixes

Breaking Changes

  • Sandboxing and Code Evaluators break existing agent configurations.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track phoenix

Get notified when new releases ship.

Sign up free

About phoenix

AI Observability & Evaluation

All releases →

Related context

Earlier breaking changes

Beta — feedback welcome: [email protected]