hidai25/eval-view

v0.6.1 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 4mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

All CLI flags are now exposed via MCP tools, adding compare_agents and replay utilities.

Full changelog

What's new

Full MCP feature parity — all CLI flags now exposed via MCP tools (heal, strict, statistical, budget, tags, variants, and more)
New MCP tools: compare_agents (A/B test two endpoints) and replay (trajectory diff viewer)
33 MCP regression tests — protocol, schema contracts, flag wiring, routing, timeouts

Fixes

Stable JSON response contract on run_check regardless of flags
--report no longer opens browser from MCP server
Replay timeout increased to 120s
Subprocess calls use stdin=DEVNULL to prevent hangs

Install / Upgrade

pip install --upgrade evalview

Full changelog: https://github.com/hidai25/eval-view/blob/main/CHANGELOG.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

hidai25/eval-view

Summary

What's new

Fixes

Install / Upgrade

Related context

Related tools