Skip to content

hidai25/eval-view

v0.3.0 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

EvalView now integrates as an MCP server inside Claude Code with seven new tools.

Full changelog

What's New in 0.3

🤖 Claude Code MCP Integration

EvalView now runs as an MCP server inside Claude Code — test your agent without leaving the conversation.

claude mcp add --transport stdio evalview -- evalview mcp serve
cp CLAUDE.md.example CLAUDE.md

7 MCP tools available:

| Tool | What it does |
|------|-------------|
| create_test | Generate test cases from natural language |
| run_snapshot | Capture golden baseline |
| run_check | Detect regressions inline |
| list_tests | Show all baselines |
| validate_skill | Validate SKILL.md structure |
| generate_skill_tests | Auto-generate skill test suite |
| run_skill_test | Run Phase 1 (deterministic) + Phase 2 (rubric) |

📊 Telemetry Improvements

  • Users now show as EvalView-3f8a2b instead of raw UUIDs in PostHog
  • Session duration tracking (session_duration_ms)
  • Set EVALVIEW_DEV=1 to tag your own events for filtering

🐕 Dogfood Regression Testing

EvalView now tests itself using its own evaluation logic on every CI run.

Bug Fixes

  • Fixed PIPESTATUS CI bug (regression checks now correctly fail CI)
  • Fixed deprecated asyncio.get_event_loop()get_running_loop()
  • Fixed silent failures in --json mode
  • ANSI escape stripping improved in MCP output

Upgrade

pip install --upgrade evalview

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]