Skip to content

hidai25/eval-view

v0.2.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

CLI output now displays AGENT HEALTHY / REGRESSION DETECTED status.

Full changelog

What's New

Trust-First Repositioning

  • "Proof that your agent still works" — new messaging throughout CLI and docs
  • CLI output now shows AGENT HEALTHY / REGRESSION DETECTED instead of generic pass/fail
  • Trust-framing summary after every run with actionable next steps

New Flags

  • evalview run --save-golden — one-step baseline capture when all tests pass
  • evalview init --ci — generates a GitHub Actions workflow instantly

Comprehensive Telemetry

  • @track_command on all CLI commands (was 20% coverage, now 100%)
  • Chat session tracking (provider, model, slash commands used)
  • CI environment detection across 8 providers

Quality Fixes

  • Fixed trust summary ignoring execution errors
  • Fixed --save-golden saving baselines during execution errors
  • Synced version strings across pyproject.toml, cli.py, and init.py

README Upgrade

  • Pro visual style with centered badges, emoji headers, and star history chart
  • Restructured Quick Start as numbered steps
  • Added "Why EvalView?" bullet list with comparison table

Full Changelog: https://github.com/hidai25/eval-view/compare/v0.2.4...v0.2.5

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]