Skip to content

Release history

Verdict releases

All releases

2 shown

v0.2.0 New feature
Notable features
  • CSV support via Dataset.from_csv() with default column names `input` and `ideal` and overrides `input_field`/`output_field`
  • Arbitrary JSONL field mapping through CLI flags `--input-field` / `--output-field` and Python API
  • Label‑free evaluation allowing datasets without reference answers; reference‑based metrics emit a clear upfront error
Full changelog

What's new in 0.2.0

Dataset

  • CSV support via Dataset.from_csv() — default column names input and ideal, with input_field/output_field overrides for custom schemas
  • Arbitrary JSONL field mapping via --input-field / --output-field CLI flags and Python API
  • Label-free evaluation — datasets without reference answers work end-to-end; reference-based metrics raise a clear error upfront

Metrics

  • Multi-dimensional LLM-as-judge via the dimensions parameter — score multiple criteria (e.g. fluency, accuracy, safety) in a single judge call
v0.1.0 Maintenance

Minor fixes and improvements.

Full changelog

v0.1.0 — Initial release

aevyra-verdict is a framework for evaluating and comparing LLM outputs across models and providers.

What's included

Core evaluation engine

  • Run completions concurrently across any combination of models and providers
  • Configurable concurrency, retries, and exponential backoff for rate limit handling
  • Structured results with per-model scores, latency, and token usage

Providers

  • Built-in support for OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and OpenRouter
  • Local model support via any OpenAI-compatible API (Ollama, vLLM)

Metrics

  • Reference-based: ROUGE, BLEU, exact match
  • LLM-as-judge with configurable criteria and custom prompt templates
  • Custom scoring functions via Python callables

Dataset formats

  • Auto-detection of OpenAI, ShareGPT, and Alpaca formats
  • Filtering by metadata fields

CLI

  • aevyra-verdict run — run evals and print comparison table
  • aevyra-verdict inspect — preview a dataset
  • aevyra-verdict providers — check which API keys are configured

Docs

Install

pip install aevyra-verdict

Beta — feedback welcome: [email protected]