v0.2.0 New feature 3mo

Notable features

CSV support via Dataset.from_csv() with default column names `input` and `ideal` and overrides `input_field`/`output_field`
Arbitrary JSONL field mapping through CLI flags `--input-field` / `--output-field` and Python API
Label‑free evaluation allowing datasets without reference answers; reference‑based metrics emit a clear upfront error

Full changelog

What's new in 0.2.0

Dataset

CSV support via Dataset.from_csv() — default column names input and ideal, with input_field/output_field overrides for custom schemas
Arbitrary JSONL field mapping via --input-field / --output-field CLI flags and Python API
Label-free evaluation — datasets without reference answers work end-to-end; reference-based metrics raise a clear error upfront

Metrics

Multi-dimensional LLM-as-judge via the dimensions parameter — score multiple criteria (e.g. fluency, accuracy, safety) in a single judge call

v0.1.0 Maintenance 3mo

Minor fixes and improvements.

Full changelog

v0.1.0 — Initial release

aevyra-verdict is a framework for evaluating and comparing LLM outputs across models and providers.

Run completions concurrently across any combination of models and providers
Configurable concurrency, retries, and exponential backoff for rate limit handling
Structured results with per-model scores, latency, and token usage

Built-in support for OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and OpenRouter
Local model support via any OpenAI-compatible API (Ollama, vLLM)

pip install aevyra-verdict