hidai25/eval-view

v0.5.4 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 4mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Updated default LLM model from gpt-4o-mini to gpt-5.4-mini and added OpenClaw integration commands.

Full changelog

What's New

Python API

gate() and gate_async() — programmatic regression checks, no CLI needed
gate(quick=True) — skip LLM judge for free, sub-second checks
from evalview import gate, DiffStatus — clean top-level imports
Typed results: GateResult, TestDiff, GateSummary

Terminal Dashboard

Scorecard panel with colored health bar, streak tracker, and gauge
Unicode sparkline trends from drift history
Confidence scoring on each verdict (z-score based signal vs noise)
Smart accept suggestions when changes look intentional

HTML Report Dashboard

SVG health gauge with pass/fail breakdown
Chart.js trend lines for output similarity over time
Confidence badges on diff rows
Accept suggestion boxes with copy-paste commands

OpenClaw Integration

evalview openclaw install — install gate skill into claw workspace
evalview openclaw check — run gate with auto-revert
gate_or_revert() / check_and_decide() Python helpers
Built-in SKILL.md for autonomous agent loops

MCP Server

run_check rewired to call gate() directly (no subprocess)
Fallback to subprocess on error

Other

evalview snapshot --preview — dry-run before saving baselines
python -m evalview support
Centralized model defaults (DEFAULT_MODELS, DEFAULT_JUDGE_MODEL)
Updated all defaults from gpt-4o-mini to gpt-5.4-mini
22 new API tests (1147 total passing)
mypy clean (166 source files, 0 errors)

Install

pip install evalview==0.5.4

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →