This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+12 more
Summary
AI summaryAdds an Aider CLI adapter enabling EvalView to drive Aider as an evaluation adapter.
Full changelog
Minor release — 33 commits since 0.6.2, 14 new user-facing features.
Highlights
- Aider CLI adapter — drive Aider as an EvalView adapter
- Autopr loop — prod-incident → regression-test → PR, closed loop
- Flake quarantine — known-flaky tests don't block CI, with governance metadata
- Release verdict +
evalview since— graded ship/hold verdict + change brief progress/drift/slack-digest— investigative loop commands- Noise confirmation gate +
--strictbypass — two-cycle rule before alerting - Slow-agent warning — real wall-clock latency regression detection
- Observability signals — trust score, tool-loop, brittle-recovery, gaming checks
- Improvement recommendation engine — prioritized stabilize / tighten / add-check suggestions
- Simulation harness + decision-rationale (schema v2) — scripted multi-turn scenarios, machine-readable reasons
snapshot --json— CI-friendly, hardened for edge casescheck --explain— deep trace narrative for root-cause hypotheses- Token cost breakdown in
check— input/output/cached tokens + cost delta vs baseline - Skill-doctor char-budget refinement — disable-model-invocation skills excluded
Plus ~10 fixes (mypy narrowing, dogfood hardening, slack-digest type errors, noise strict-bucket leak, snapshot --json CI hardening) and README/CLI doc improvements.
Install
pip install evalview==0.7.0
# or
npm install [email protected]
Full changelog: https://github.com/hidai25/eval-view/blob/v0.7.0/CHANGELOG.md
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About hidai25/eval-view
Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.
Related context
Related tools
Beta — feedback welcome: [email protected]