Skip to content

hidai25/eval-view

v0.6.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen
+12 more
cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Added evalview model-check for closed-model drift detection with new internals and extensive tests.

Full changelog

What's new

evalview model-check — closed-model drift detection

Detect silent drift in closed-weight models (Anthropic in v1; OpenAI/Mistral/Cohere in v1.1) by running a small structural canary suite directly against the provider.

  • Two-anchor comparison (reference + previous)
  • Dry-run cost estimation
  • Per-provider fingerprint strength labeling
  • Custom suites via --suite
  • Suite-hash enforcement for rotation safety
  • Pinned temperature=0.0 / top_p=1.0 for stable drift signal

Bundled canary suite

15 structural prompts across four scorer families: tool choice, JSON schema, refusal, exact match. Versioned, hash-pinned, rotated via held-out companion suite.

New internals

  • DriftKind + DriftConfidence enums — unified drift taxonomy
  • model_snapshots — timestamped store with auto-pin first-run reference and pruning
  • model_check_scoring — pure-function structural scorers (no LLM judge dependency)
  • model_provider_runner — single-shot completions with per-provider fingerprint capture
  • anthropic adapter registered in adapter_factory
  • TraceDiff gains drift_kind and drift_confidence fields

Tests

80 net new tests covering snapshot store (16), structural scorers (29), canary suite loader (13), and command integration (22) — all mocked, no real API calls in CI.


Install / upgrade:

pip install evalview==0.6.2

Full changelog: https://github.com/hidai25/eval-view/blob/main/CHANGELOG.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track hidai25/eval-view

Get notified when new releases ship.

Sign up free

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

Related context

Beta — feedback welcome: [email protected]