Skip to content

Verdict

v0.2.0 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 1mo AI Coding Tools
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

benchmarking evals llm llm-benchmarking llm-evaluation model-evaluation
+2 more
model-selection python

Summary

AI summary

Added CSV support, arbitrary JSONL field mapping, label‑free evaluation, and multi‑dimensional LLM‑as‑judge metrics.

Full changelog

What's new in 0.2.0

Dataset

  • CSV support via Dataset.from_csv() — default column names input and ideal, with input_field/output_field overrides for custom schemas
  • Arbitrary JSONL field mapping via --input-field / --output-field CLI flags and Python API
  • Label-free evaluation — datasets without reference answers work end-to-end; reference-based metrics raise a clear error upfront

Metrics

  • Multi-dimensional LLM-as-judge via the dimensions parameter — score multiple criteria (e.g. fluency, accuracy, safety) in a single judge call

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Verdict

Get notified when new releases ship.

Sign up free

About Verdict

All releases →

Beta — feedback welcome: [email protected]