hidai25/eval-view

v0.5.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 4mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Production‑grade cold‑start test generation now supports GPT‑5 models.

Full changelog

What's New in v0.5.2

Cold-Start Test Generation (`evalview generate`)

Production-grade test generation from live agent probing — no manual YAML writing needed
Interactive probe budget and model selection
Multi-turn conversation tests generated as single cohesive test cases
Domain-aware draft generation with coherence filtering
--synth-model flag to override the synthesis model
Real-time elapsed timer during probe runs
Delta reporting: shows changes since last generation

Improved Reports

Model and token usage displayed in HTML reports
Judge cost tracking surfaced in check reports
Per-query model shown in trace cost breakdown
Cleaner baseline metadata and timeline in check reports
Turn-level details with clickable chevrons in multi-turn traces

Better Onboarding (`evalview init`)

Remembers active test suite for plain snapshot and check
Auto-approves generated drafts with scoped snapshot guidance
Detects local agents on /execute and /health endpoints
Refreshes stale config when a live agent is detected

Check Command Improvements

Shows last baseline snapshot timestamp
Auto-generates local HTML report on failures
Streamlined regression demo flow

Model Support

GPT-5 family model support (gpt-5.4, gpt-5.4-mini)
Interactive model selection from available providers

Multi-Turn & Monitoring

Multi-turn golden baselines with per-turn tool sequences
Cost/latency spike alerts in monitor mode
Batch edge-case expansion for test coverage

Bug Fixes

Fix multi-turn filter — different output is meaningful regardless of tools
Fix probe progress for skipped follow-ups
Predictable timing — 1 discovery, multi-turn counts against budget
Always show agent model in run output
Eliminate duplicate multi-turn tests
Silence Ollama JSON fallback warnings in normal runs

Docs

Trimmed README from 1420 to 274 lines — details moved to dedicated docs
Comparison docs and SEO content added

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

hidai25/eval-view

Summary

What's New in v0.5.2

Cold-Start Test Generation (`evalview generate`)

Improved Reports

Better Onboarding (`evalview init`)

Check Command Improvements

Model Support

Multi-Turn & Monitoring

Bug Fixes

Docs

Related context

Related tools

hidai25/eval-view

Summary

What's New in v0.5.2

Cold-Start Test Generation (evalview generate)

Improved Reports

Better Onboarding (evalview init)

Check Command Improvements

Model Support

Multi-Turn & Monitoring

Bug Fixes

Docs

Related context

Related tools

Cold-Start Test Generation (`evalview generate`)

Better Onboarding (`evalview init`)