Skip to content

hidai25/eval-view

v0.2.6 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

EvalView now integrates with Claude Code as an MCP server, adding regression‑checking tools.

Full changelog

What's New

Claude Code MCP Integration

EvalView now runs as an MCP server inside Claude Code — zero context switching.

Setup (one-time):
```bash
claude mcp add --transport stdio evalview -- evalview mcp serve
cp CLAUDE.md.example CLAUDE.md
```

Then just ask Claude: "Did my refactor break the golden baseline?"

New MCP Tools

| Tool | Description |
|------|-------------|
| run_check | Check for regressions against the golden baseline |
| run_snapshot | Save current behavior as the new baseline |
| list_tests | List available golden baselines |
| create_test | Generate a test case YAML from natural language |

Bug Fixes

  • Fixed stale import evalview.evaluators.mainevalview.evaluators.evaluator
  • Fixed _create_adapter not passing allow_private_urls to HTTPAdapter
  • Fixed adapter.run()adapter.execute() in snapshot/check code paths

Files

  • evalview/mcp_server.py — new MCP server
  • evalview/cli.pyevalview mcp serve command
  • CLAUDE.md.example — proactive workflow instructions for Claude Code

Upgrade

```bash
pip install --upgrade evalview
```

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]