This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+12 more
Summary
AI summaryAdded evalview model-check for closed-model drift detection with new internals and extensive tests.
Full changelog
What's new
evalview model-check — closed-model drift detection
Detect silent drift in closed-weight models (Anthropic in v1; OpenAI/Mistral/Cohere in v1.1) by running a small structural canary suite directly against the provider.
- Two-anchor comparison (reference + previous)
- Dry-run cost estimation
- Per-provider fingerprint strength labeling
- Custom suites via
--suite - Suite-hash enforcement for rotation safety
- Pinned
temperature=0.0/top_p=1.0for stable drift signal
Bundled canary suite
15 structural prompts across four scorer families: tool choice, JSON schema, refusal, exact match. Versioned, hash-pinned, rotated via held-out companion suite.
New internals
DriftKind+DriftConfidenceenums — unified drift taxonomymodel_snapshots— timestamped store with auto-pin first-run reference and pruningmodel_check_scoring— pure-function structural scorers (no LLM judge dependency)model_provider_runner— single-shot completions with per-provider fingerprint captureanthropicadapter registered inadapter_factoryTraceDiffgainsdrift_kindanddrift_confidencefields
Tests
80 net new tests covering snapshot store (16), structural scorers (29), canary suite loader (13), and command integration (22) — all mocked, no real API calls in CI.
Install / upgrade:
pip install evalview==0.6.2
Full changelog: https://github.com/hidai25/eval-view/blob/main/CHANGELOG.md
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About hidai25/eval-view
Regression testing framework for AI agents. Save golden baselines, detect behavioral drift, and block regressions in CI. Works with LangGraph, CrewAI, OpenAI, Claude, and any HTTP API.
Related context
Related tools
Beta — feedback welcome: [email protected]