Skip to content

hidai25/eval-view

v0.1.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Added variance-aware statistical testing with configurable confidence levels.

Full changelog

What's New

Statistical Pass/Fail System

  • Variance-aware testing - Run tests multiple times to get statistically significant results
  • Confidence levels - Configure how confident you want to be in pass/fail decisions
  • CLI integration - New --runs flag to run tests multiple times
# Run each test 5 times for statistical analysis
evalview run --runs 5

LangGraph Adapter Fix

  • Fixed adapter compatibility issues for better LangGraph integration

Config-Free Runs

  • Run evalview run without requiring a config file
  • Automatically discovers test cases in the current directory

Templates

  • Added test case templates for common evaluation patterns
  • Quick-start templates for tool calling, RAG, and multi-turn scenarios

Node SDK License Fix

  • Fixed license mismatch - now correctly uses Apache 2.0

Documentation Improvements

  • Added FAQ section and comparison table to README
  • Added "Run examples directly" section
  • Added design partners section
  • Improved README structure for better clarity

Full Changelog

https://github.com/hidai25/eval-view/compare/v0.1.4...v0.1.5

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]