hidai25/eval-view

v0.1.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 7mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Added variance-aware statistical testing with configurable confidence levels.

Full changelog

What's New

Statistical Pass/Fail System

Variance-aware testing - Run tests multiple times to get statistically significant results
Confidence levels - Configure how confident you want to be in pass/fail decisions
CLI integration - New --runs flag to run tests multiple times

# Run each test 5 times for statistical analysis
evalview run --runs 5

LangGraph Adapter Fix

Fixed adapter compatibility issues for better LangGraph integration

Config-Free Runs

Run evalview run without requiring a config file
Automatically discovers test cases in the current directory

Templates

Added test case templates for common evaluation patterns
Quick-start templates for tool calling, RAG, and multi-turn scenarios

Node SDK License Fix

Fixed license mismatch - now correctly uses Apache 2.0

Documentation Improvements

Added FAQ section and comparison table to README
Added "Run examples directly" section
Added design partners section
Improved README structure for better clarity

Full Changelog

https://github.com/hidai25/eval-view/compare/v0.1.4...v0.1.5

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →