This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+12 more
Affected surfaces
Summary
AI summaryFixed auth failure in claude-code adapter by stripping ANTHROPIC_API_KEY and raised MCP server skill test timeout to 600 s.
Full changelog
What's fixed
claude-code adapter: auth failure in MCP context
The adapter was failing immediately (~3-4s) with "Invalid API key" when invoked through the MCP chain. Root cause: Claude Code sets ANTHROPIC_API_KEY to a session-scoped token in its subprocess environment, which the inner claude --print inherited and the Anthropic API rejected.
Fix: Strip ANTHROPIC_API_KEY from the adapter's env so the inner claude falls back to ~/.claude.json credentials (stored by claude auth login).
custom adapter: works for OAuth users (no API key needed)
The demo runner.py used the Anthropic SDK directly, which requires ANTHROPIC_API_KEY. Claude Code OAuth users don't have this env var set.
Fix: Rewrote runner to use claude --print subprocess (same auth path as the claude-code adapter).
MCP server: skill test timeout raised to 600s
Multi-test suites (10 tests × ~15s each) were hitting the previous 120s timeout.
Other improvements
- Non-interactive mode for
generate-tests(--auto/ no TTY) - Better first-snapshot and first-check celebration panels with CI integration steps
- 60s asyncio timeout on LLM calls in test generator
- Actionable hints when skill dependencies (e.g. mcporter) are missing
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About hidai25/eval-view
Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.
Related context
Related tools
Beta — feedback welcome: [email protected]