hidai25/eval-view

v0.6.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-benchmark agent-evaluation agentic-ai ai-agents anthropic autogen

+12 more

cli crewai evaluation langchain-agent langgraph llm mcp openai-assistants pytest python regression-testing testing

Summary

AI summary

Added evalview model-check for closed-model drift detection with new internals and extensive tests.

Full changelog

What's new

`evalview model-check` — closed-model drift detection

Detect silent drift in closed-weight models (Anthropic in v1; OpenAI/Mistral/Cohere in v1.1) by running a small structural canary suite directly against the provider.

Two-anchor comparison (reference + previous)
Dry-run cost estimation
Per-provider fingerprint strength labeling
Custom suites via --suite
Suite-hash enforcement for rotation safety
Pinned temperature=0.0 / top_p=1.0 for stable drift signal

Bundled canary suite

15 structural prompts across four scorer families: tool choice, JSON schema, refusal, exact match. Versioned, hash-pinned, rotated via held-out companion suite.

New internals

DriftKind + DriftConfidence enums — unified drift taxonomy
model_snapshots — timestamped store with auto-pin first-run reference and pruning
model_check_scoring — pure-function structural scorers (no LLM judge dependency)
model_provider_runner — single-shot completions with per-provider fingerprint capture
anthropic adapter registered in adapter_factory
TraceDiff gains drift_kind and drift_confidence fields

Tests

80 net new tests covering snapshot store (16), structural scorers (29), canary suite loader (13), and command integration (22) — all mocked, no real API calls in CI.

Install / upgrade:

pip install evalview==0.6.2

Full changelog: https://github.com/hidai25/eval-view/blob/main/CHANGELOG.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track hidai25/eval-view

Get notified when new releases ship.

About hidai25/eval-view

Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.

All releases →

hidai25/eval-view

Summary

What's new

`evalview model-check` — closed-model drift detection

Bundled canary suite

New internals

Tests

Related context

Related tools

hidai25/eval-view

Summary

What's new

evalview model-check — closed-model drift detection

Bundled canary suite

New internals

Tests

Related context

Related tools

`evalview model-check` — closed-model drift detection