hidai25/eval-view

v0.3.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 4mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Affected surfaces

auth

Summary

AI summary

Fixed auth failure in claude-code adapter by stripping ANTHROPIC_API_KEY and raised MCP server skill test timeout to 600 s.

Full changelog

What's fixed

claude-code adapter: auth failure in MCP context

The adapter was failing immediately (~3-4s) with "Invalid API key" when invoked through the MCP chain. Root cause: Claude Code sets ANTHROPIC_API_KEY to a session-scoped token in its subprocess environment, which the inner claude --print inherited and the Anthropic API rejected.

Fix: Strip ANTHROPIC_API_KEY from the adapter's env so the inner claude falls back to ~/.claude.json credentials (stored by claude auth login).

custom adapter: works for OAuth users (no API key needed)

The demo runner.py used the Anthropic SDK directly, which requires ANTHROPIC_API_KEY. Claude Code OAuth users don't have this env var set.

Fix: Rewrote runner to use claude --print subprocess (same auth path as the claude-code adapter).

MCP server: skill test timeout raised to 600s

Multi-test suites (10 tests × ~15s each) were hitting the previous 120s timeout.

Other improvements

Non-interactive mode for generate-tests (--auto / no TTY)
Better first-snapshot and first-check celebration panels with CI integration steps
60s asyncio timeout on LLM calls in test generator
Actionable hints when skill dependencies (e.g. mcporter) are missing

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →