Skip to content

hidai25/eval-view

v0.5.4 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Updated default LLM model from gpt-4o-mini to gpt-5.4-mini and added OpenClaw integration commands.

Full changelog

What's New

Python API

  • gate() and gate_async() — programmatic regression checks, no CLI needed
  • gate(quick=True) — skip LLM judge for free, sub-second checks
  • from evalview import gate, DiffStatus — clean top-level imports
  • Typed results: GateResult, TestDiff, GateSummary

Terminal Dashboard

  • Scorecard panel with colored health bar, streak tracker, and gauge
  • Unicode sparkline trends from drift history
  • Confidence scoring on each verdict (z-score based signal vs noise)
  • Smart accept suggestions when changes look intentional

HTML Report Dashboard

  • SVG health gauge with pass/fail breakdown
  • Chart.js trend lines for output similarity over time
  • Confidence badges on diff rows
  • Accept suggestion boxes with copy-paste commands

OpenClaw Integration

  • evalview openclaw install — install gate skill into claw workspace
  • evalview openclaw check — run gate with auto-revert
  • gate_or_revert() / check_and_decide() Python helpers
  • Built-in SKILL.md for autonomous agent loops

MCP Server

  • run_check rewired to call gate() directly (no subprocess)
  • Fallback to subprocess on error

Other

  • evalview snapshot --preview — dry-run before saving baselines
  • python -m evalview support
  • Centralized model defaults (DEFAULT_MODELS, DEFAULT_JUDGE_MODEL)
  • Updated all defaults from gpt-4o-mini to gpt-5.4-mini
  • 22 new API tests (1147 total passing)
  • mypy clean (166 source files, 0 errors)

Install

pip install evalview==0.5.4

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]