This release adds 3 notable features for engineering teams evaluating rollout.
Published 1mo
AI Coding Tools
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
benchmarking
evals
llm
llm-benchmarking
llm-evaluation
model-evaluation
+2 more
model-selection
python
Summary
AI summaryAdded CSV support, arbitrary JSONL field mapping, label‑free evaluation, and multi‑dimensional LLM‑as‑judge metrics.
Full changelog
What's new in 0.2.0
Dataset
- CSV support via
Dataset.from_csv()— default column namesinputandideal, withinput_field/output_fieldoverrides for custom schemas - Arbitrary JSONL field mapping via
--input-field/--output-fieldCLI flags and Python API - Label-free evaluation — datasets without reference answers work end-to-end; reference-based metrics raise a clear error upfront
Metrics
- Multi-dimensional LLM-as-judge via the
dimensionsparameter — score multiple criteria (e.g. fluency, accuracy, safety) in a single judge call
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Verdict
All releases →Related context
Beta — feedback welcome: [email protected]