This release adds 2 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+8 more
Summary
AI summaryUpdates Highlights, https://arxiv.org/abs/2605.28500, and white-box across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Low |
Adds CodeGenUQ for code generation uncertainty quantification. Adds CodeGenUQ for code generation uncertainty quantification. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Low |
Adds LiveCodeBench dataset loading and code execution evaluation. Adds LiveCodeBench dataset loading and code execution evaluation. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Low |
Adds Discord badge and community link to documentation. Adds Discord badge and community link to documentation. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Low |
Adds code generation links and BibTeX citation to documentation. Adds code generation links and BibTeX citation to documentation. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Feature | Low |
Releases minor version v0.6.0. Releases minor version v0.6.0. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Bugfix | Medium |
Removes print statements and fixes edge‑case handling. Removes print statements and fixes edge‑case handling. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Bugfix | Low |
Removes platform‑wide test skips. Removes platform‑wide test skips. Source: llm_adapter@2026-06-01 Confidence: high |
— |
| Refactor | Low |
Refactors evaluation code to accept list inputs and return dict results. Refactors evaluation code to accept list inputs and return dict results. Source: llm_adapter@2026-06-01 Confidence: high |
— |
Full changelog
Highlights
Add CodeGenUQ for Code Generation Uncertainty Quantification
As an extension of the work in Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification (Bouchard et. al, 2026), this minor release introduces CodeGenUQ, a unified interface for computing confidence scores that predict functional correctness in LLM-generated code without requiring execution or test cases at inference time. Under the hood, CodeGenUQ relies on new methods defined in uqlm.code subpackage. This release also creates a new demo notebook to illustrate this functionality.
Overview
CodeGenUQ extends ShortFormUQ to support code-specific uncertainty quantification across three scorer families:
Token-Probability Scorers (white-box)
- Length-normalized sequence probability (LNSP)
- Minimum token probability (MTP)
- Probability margin (PM)
- Average/minimum token negentropy (ATN@K, MTN@K)
Sampling-Based Consistency Scorers (black-box)
- Functional equivalence methods: Replace NLI-based semantic comparison with LLM-based assessment of whether code snippets produce identical outputs for all valid inputs
functional_equivalence_rate: Proportion of samples functionally equivalent to the originalfunctional_negentropy: Entropy over functional equivalence clusters (code analogue of semantic entropy)functional_sets_confidence: Normalized count of unique functional clusters
- Similarity methods:
cosine_sim: Code embedding similarity (default:jina-embeddings-v2-base-code)codebleu: CodeBLEU consistency incorporating n-gram, syntax, and data-flow matching
Reflexive Scorers
p_true: Token probability assigned to "True" for self-evaluationverbalized_confidence: Likert-scale confidence elicitation
Usage
from uqlm import CodeGenUQ
uq = CodeGenUQ(
llm=my_llm,
scorers=["functional_equivalence_rate", "cosine_sim", "sequence_probability"],
language="python"
)
results = await uq.generate_and_score(prompts, num_responses=5)
What's Changed
- CodeGenUQ : Code Generation tasks by @mohitcek in https://github.com/cvs-health/uqlm/pull/358
- Unit tests for CodeGenUQ by @zeya30 in https://github.com/cvs-health/uqlm/pull/369
- Patch/v0.5.8 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/373
- Progress bars for CodeGenUQ by @zeya30 in https://github.com/cvs-health/uqlm/pull/376
- Refactor: remove unused/redundant code by @zeya30 in https://github.com/cvs-health/uqlm/pull/377
- Add LiveCodeBench dataset loading and code execution evaluation by @zeya30 in https://github.com/cvs-health/uqlm/pull/380
- Code gen progress bar by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/379
- Cleanup code gen by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/382
- Add OSS quality improvements by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/381
- Develop by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/383
- v0.5.9 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/387
- Merge develop -> code-gen-uq by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/388
- UQ for Code Generation by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/389
- Update docstring by @zeya30 in https://github.com/cvs-health/uqlm/pull/391
- v0.5.10 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/393
- Refactor evaluation code to accept list inputs and return dict results by @zeya30 in https://github.com/cvs-health/uqlm/pull/394
- v0.5.11 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/396
- Remove print and fix edge case handling by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/397
- Remove platform-wide test skips by @mohitcek in https://github.com/cvs-health/uqlm/pull/399
- Minor updates and improve code coverage for CodeGenUQ class by @mohitcek in https://github.com/cvs-health/uqlm/pull/403
- Structured outputs by @aaronlohner in https://github.com/cvs-health/uqlm/pull/384
- Add Discord badge and community link by @virenbajaj in https://github.com/cvs-health/uqlm/pull/404
- add code gen links and bibtex by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/405
- Minor release:
v0.6.0by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/406
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.5.11...v0.6.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About UQLM
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]