hidai25/eval-view

v0.7.0 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Adds an Aider CLI adapter enabling EvalView to drive Aider as an evaluation adapter.

Full changelog

Minor release — 33 commits since 0.6.2, 14 new user-facing features.

Highlights

Aider CLI adapter — drive Aider as an EvalView adapter
Autopr loop — prod-incident → regression-test → PR, closed loop
Flake quarantine — known-flaky tests don't block CI, with governance metadata
Release verdict + evalview since — graded ship/hold verdict + change brief
progress / drift / slack-digest — investigative loop commands
Noise confirmation gate + --strict bypass — two-cycle rule before alerting
Slow-agent warning — real wall-clock latency regression detection
Observability signals — trust score, tool-loop, brittle-recovery, gaming checks
Improvement recommendation engine — prioritized stabilize / tighten / add-check suggestions
Simulation harness + decision-rationale (schema v2) — scripted multi-turn scenarios, machine-readable reasons
snapshot --json — CI-friendly, hardened for edge cases
check --explain — deep trace narrative for root-cause hypotheses
Token cost breakdown in check — input/output/cached tokens + cost delta vs baseline
Skill-doctor char-budget refinement — disable-model-invocation skills excluded

Plus ~10 fixes (mypy narrowing, dogfood hardening, slack-digest type errors, noise strict-bucket leak, snapshot --json CI hardening) and README/CLI doc improvements.

Install

pip install evalview==0.7.0
# or
npm install [email protected]

Full changelog: https://github.com/hidai25/eval-view/blob/v0.7.0/CHANGELOG.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

hidai25/eval-view

Summary

Highlights

Install

Related context

Related tools