This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+12 more
Summary
AI summaryEvalView now integrates as an MCP server inside Claude Code with seven new tools.
Full changelog
What's New in 0.3
🤖 Claude Code MCP Integration
EvalView now runs as an MCP server inside Claude Code — test your agent without leaving the conversation.
claude mcp add --transport stdio evalview -- evalview mcp serve
cp CLAUDE.md.example CLAUDE.md
7 MCP tools available:
| Tool | What it does |
|------|-------------|
| create_test | Generate test cases from natural language |
| run_snapshot | Capture golden baseline |
| run_check | Detect regressions inline |
| list_tests | Show all baselines |
| validate_skill | Validate SKILL.md structure |
| generate_skill_tests | Auto-generate skill test suite |
| run_skill_test | Run Phase 1 (deterministic) + Phase 2 (rubric) |
📊 Telemetry Improvements
- Users now show as
EvalView-3f8a2binstead of raw UUIDs in PostHog - Session duration tracking (
session_duration_ms) - Set
EVALVIEW_DEV=1to tag your own events for filtering
🐕 Dogfood Regression Testing
EvalView now tests itself using its own evaluation logic on every CI run.
Bug Fixes
- Fixed PIPESTATUS CI bug (regression checks now correctly fail CI)
- Fixed deprecated
asyncio.get_event_loop()→get_running_loop() - Fixed silent failures in
--jsonmode - ANSI escape stripping improved in MCP output
Upgrade
pip install --upgrade evalview
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About hidai25/eval-view
Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.
Related context
Related tools
Beta — feedback welcome: [email protected]