Skip to content
EleutherAI / Lm-Evaluation-Harness
AI Coding Tools
A unified framework for evaluating generative language models across dozens of academic benchmarks, custom prompts, and multiple inference back‑ends.
Python
·
Latest v0.4.12 · 23d ago
Security brief →
Features
-
Supports >60 standard LLM benchmarks with hundreds of subtasks
-
Flexible model loading via Transformers, vLLM, GPT‑NeoX, Megatron‑DeepSpeed, and commercial APIs (OpenAI, TextSynth)
-
Configurable prompt design with Jinja2 and import from Promptsource
No immediate action
v0.4.12
Breaking risk
·
SteeredModel rename + vLLM bump + thinking flag change
Review required
v0.4.11
Maintenance
·
RCE / SSRF
Dependencies
Routine maintenance and dependency updates.
Review required
v0.4.10
Breaking risk
·
Breaking upgrade
Optional backend installation
Review required
v0.4.9.2
Breaking risk
·
Breaking upgrade
Python 3.10 minimum
No immediate action
v0.4.9.1
Breaking risk
·
New benchmarks + TruthfulQA
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
About
Languages
Python
·
Shell
·
C++
View on GitHub
Search tools, categories, lists, and users
Use ↑↓ to navigate, Enter to open, Esc to close
No results for ""
⌘K to open
↑↓ navigate
⏎ open