EleutherAI / Lm-Evaluation-Harness

AI Coding Tools

A unified framework for evaluating generative language models on over 60 academic benchmarks and custom tasks

Track releases GitHub

Python Latest v0.4.12 · 2mo ago Security brief →

Features

Supports >60 standard LLM evaluation benchmarks with hundreds of subtasks
Works with major model backends: HuggingFace transformers, vLLM, GPT‑NeoX, Megatron‑DeepSpeed, and commercial APIs (OpenAI, TextSynth)
Allows quantization, adapters (LoRA), prompt customization via Jinja2, and flexible config‑based task creation

Recent releases

View all 6 releases →

No immediate action

v0.4.12 Breaking risk 2mo

SteeredModel rename + vLLM bump + thinking flag change

Open

Review required

v0.4.11 Maintenance 5mo

RCE / SSRF Dependencies

Routine maintenance and dependency updates.

Open

Review required

v0.4.10 Breaking risk 6mo

Breaking upgrade

Optional backend installation

Open

Review required

v0.4.9.2 Breaking risk 8mo

Breaking upgrade

Python 3.10 minimum

Open

No immediate action

v0.4.9.1 Breaking risk 11mo

New benchmarks + TruthfulQA

Open

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Releases

View all →

Releases per month

Cadence 0.1 / wk

Last release 76d

Tracked 6

Security

Full profile →

Security score 8.0/10

OpenSSF 5.5/10

Open CVEs 0

Active maintainer

Community

GitHub stars 13,279

Forks 3,407

Contributors 90d 5

Open issues 910

Open PRs 400

Stars/wk velocity 0.0

About

Stars

13,279

Forks

3,407

Languages

Python Shell C++

View on GitHub

Install & Platforms

Install via

pip

Community & Support

Discord

Similar tools

Verdict

Nexa-gauge

BlazeUp-AI/Observal](https:

OmniPoly

langfuse

About

Stars

13,279

Forks

3,407

Languages

Python Shell C++

View on GitHub

Install & Platforms

Install via

pip

Community & Support

Discord

Similar tools

Verdict

Nexa-gauge

BlazeUp-AI/Observal](https:

OmniPoly

langfuse