hidai25/eval-view

v0.2.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 5mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

CLI output now displays AGENT HEALTHY / REGRESSION DETECTED status.

Full changelog

What's New

Trust-First Repositioning

"Proof that your agent still works" — new messaging throughout CLI and docs
CLI output now shows AGENT HEALTHY / REGRESSION DETECTED instead of generic pass/fail
Trust-framing summary after every run with actionable next steps

New Flags

evalview run --save-golden — one-step baseline capture when all tests pass
evalview init --ci — generates a GitHub Actions workflow instantly

Comprehensive Telemetry

@track_command on all CLI commands (was 20% coverage, now 100%)
Chat session tracking (provider, model, slash commands used)
CI environment detection across 8 providers

Quality Fixes

Fixed trust summary ignoring execution errors
Fixed --save-golden saving baselines during execution errors
Synced version strings across pyproject.toml, cli.py, and init.py

README Upgrade

Pro visual style with centered badges, emoji headers, and star history chart
Restructured Quick Start as numbered steps
Added "Why EvalView?" bullet list with comparison table

Full Changelog: https://github.com/hidai25/eval-view/compare/v0.2.4...v0.2.5

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →