✓ No known CVEs patched in this version
Topics
+10 more
ReleasePort's take
Light signalPhoenix v16.0.0 introduces breaking changes for sandboxing and code evaluators while adding new composition features for evaluation strategies.
Why it matters: Patch immediately if your application uses sandboxing or code evaluators; test the new strategy‑composition API in development before upgrading to avoid runtime failures.
Summary
AI summaryUpdates ⚠ BREAKING CHANGES, Bug Fixes, and 16.0.0 across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | Medium |
Sandboxing and Code Evaluators introduce breaking changes in Phoenix v16.0.0. Sandboxing and Code Evaluators introduce breaking changes in Phoenix v16.0.0. Source: llm_adapter@2026-05-22 Confidence: low |
— |
| Feature | Medium |
Phoenix now supports composing evaluation strategies via Code Evaluators. Phoenix now supports composing evaluation strategies via Code Evaluators. Source: llm_adapter@2026-05-22 Confidence: high |
— |
| Feature | Medium |
Agents now enable provider-native web search / fetch when available. Agents now enable provider-native web search / fetch when available. Source: llm_adapter@2026-05-22 Confidence: high |
— |
| Bugfix | Medium |
Prevents broken tool groups in agents. Prevents broken tool groups in agents. Source: llm_adapter@2026-05-22 Confidence: high |
— |
Full changelog
16.0.0 (2026-05-21)
⚠ BREAKING CHANGES
- Sandboxing and Code Evaluators (#13290)
Features
Phoenix now lets you compose evaluation strategies in code.
Most eval tooling hands you a fixed menu of judge templates. Real evaluation is rarely that tidy.
Code Evaluators enable you to build evaluation criteria the way you want. You write a Python or TypeScript evaluate() function in the Phoenix UI — no SDK, no local runtime, no deploy step — and Phoenix runs it server-side, recording labels and scores as annotations on every experiment run.
Because it's just code, you control the whole strategy:
• Composite scoring: blend sub-scores (LLM judgment + deterministic rules) into one weighted metric
• Embedding-based evaluation: cosine similarity over embeddings instead of brittle string matching
• LLM juries: poll multiple models and combine verdicts into a weighted consensus
Sandboxed Code evaluators unlock the idea of agents as a judge as well. We're excited where this is heading.
- agents: Enable provider native web search / fetch when available (#13333) (41eb4fc)
- Sandboxing and Code Evaluators (#13290) (e294d93)
Bug Fixes
Breaking Changes
- Sandboxing and Code Evaluators break existing agent configurations.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Related context
Related tools
Earlier breaking changes
- varize-phoenix-v17.0.0 Adds system settings for admin-managed assistant enablement and trace recording policy
- varize-phoenix-v15.7.0 Removes v1 /chat route and associated code
Beta — feedback welcome: [email protected]