This release adds 2 notable features for engineering teams evaluating rollout.
Published 2mo
Developer Productivity
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
agent-benchmark
agent-evaluation
agentic-ai
ai-agents
anthropic
autogen
+12 more
cli
crewai
evaluation
langchain-agent
langgraph
llm
mcp
openai-assistants
pytest
python
regression-testing
testing
Summary
AI summaryUpdated default LLM model from gpt-4o-mini to gpt-5.4-mini and added OpenClaw integration commands.
Full changelog
What's New
Python API
gate()andgate_async()— programmatic regression checks, no CLI neededgate(quick=True)— skip LLM judge for free, sub-second checksfrom evalview import gate, DiffStatus— clean top-level imports- Typed results:
GateResult,TestDiff,GateSummary
Terminal Dashboard
- Scorecard panel with colored health bar, streak tracker, and gauge
- Unicode sparkline trends from drift history
- Confidence scoring on each verdict (z-score based signal vs noise)
- Smart accept suggestions when changes look intentional
HTML Report Dashboard
- SVG health gauge with pass/fail breakdown
- Chart.js trend lines for output similarity over time
- Confidence badges on diff rows
- Accept suggestion boxes with copy-paste commands
OpenClaw Integration
evalview openclaw install— install gate skill into claw workspaceevalview openclaw check— run gate with auto-revertgate_or_revert()/check_and_decide()Python helpers- Built-in SKILL.md for autonomous agent loops
MCP Server
run_checkrewired to callgate()directly (no subprocess)- Fallback to subprocess on error
Other
evalview snapshot --preview— dry-run before saving baselinespython -m evalviewsupport- Centralized model defaults (
DEFAULT_MODELS,DEFAULT_JUDGE_MODEL) - Updated all defaults from gpt-4o-mini to gpt-5.4-mini
- 22 new API tests (1147 total passing)
- mypy clean (166 source files, 0 errors)
Install
pip install evalview==0.5.4
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About hidai25/eval-view
Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.
Related context
Related tools
Beta — feedback welcome: [email protected]