This release adds 1 notable feature for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
Affected surfaces
Summary
AI summaryMinor fixes and improvements.
Full changelog
v0.4.11 Release Notes
Minor release. Stay tuned for bigger changes next release.
New Platform Support
- Windows ML Backend — Native Windows ML inference support by @chapsiru and @chemwolf6922 in #3470, #3564, #3565
New Benchmarks & Tasks
- BEAR knowledge probe by @plonerma in #3496
Task Version Changes
The following tasks have updated versions. Results from a previous task versions may not be directly comparable. See the linked PRs or individual task READMEs for changelogs.
afrobench_belebele (all variants): 2 → 3 in #3551
evalita_llm: 0.0 → 0.1 in #3551
include (all 90 language variants): 0.0 → 0.1 in #3551
mgsm_direct (all 11 language variants): 3.0 → 4.0 by @LakshyaChaudhry in #3574
Fixes & Improvements
- Fixed SQuAD v2 evaluation by @HydrogenSulfate in #3535
- Fixed MasakhaNEWS tasks — replaced non-existent
headline_textfield withheadlineby @Mr-Neutr0n in #3567 - Fixed incorrect task configs by @baberabb in #3552
- Replaced
eval()withast.literal_evalin task configs for safer parsing by @baberabb in #3577 - Fixed SGLang duplicate registration error by @enpimashin in #3543
- Restored
hf_transferimport check by @baberabb in #3563 - Fixed
modify_gen_kwargscall in vLLM VLMs by @hmellor in #3573 - Refactored vLLM
gen_kwargsnormalization inline tomodify_gen_kwargs; fixed cachedgen_kwargsmutation by @baberabb in #3582 - Fixed README for task-listing CLI command by @UltimateJupiter in #3545
- Updated dependencies by @baberabb in #3546
New Contributors
- @HydrogenSulfate made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3535
- @UltimateJupiter made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3545
- @enpimashin made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3543
- @chapsiru made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3470
- @chemwolf6922 made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3565
- @plonerma made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3496
- @hmellor made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3573
- @Mr-Neutr0n made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3567
- @LakshyaChaudhry made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/3574
Full Changelog: https://github.com/EleutherAI/lm-evaluation-harness/compare/v0.4.10...v0.4.11
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About EleutherAI / Lm-Evaluation-Harness
All releases →Related context
Related tools
Earlier breaking changes
Beta — feedback welcome: [email protected]