Skip to content

hidai25/eval-view

v0.1.3 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

EvalView GitHub Action adds a pytest-style testing framework for AI agents.

Full changelog

EvalView GitHub Action

Pytest-style testing framework for AI agents — now available as a GitHub Action.

Usage

- uses: hidai25/[email protected]
  with:
    openai-api-key: ${{ secrets.OPENAI_API_KEY }}

Features

- 🧪 Test LangGraph, CrewAI, OpenAI, Anthropic, and custom agents
- ⚡ Parallel test execution (4 workers by default)
- 📊 Auto-generated HTML reports
- 💬 PR comments with test results
- 🤖 LLM-as-judge output evaluation
- 💰 Cost and latency threshold checks

Action Inputs

| Input          | Description                     | Default               |
|----------------|---------------------------------|-----------------------|
| openai-api-key | OpenAI API key for LLM-as-judge | -                     |
| config-path    | Path to config file             | .evalview/config.yaml |
| max-workers    | Parallel workers                | 4                     |
| fail-on-error  | Fail on test failure            | true                  |

Full Documentation

See https://github.com/hidai25/eval-view#github-action-recommended for complete usage 
examples.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]