UQLM

v0.6.0 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 1mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-evaluation ai-safety confidence-estimation confidence-score hallucination hallucination-detection

+8 more

hallucination-evaluation hallucination-mitigation llm llm-evaluation llm-hallucination llm-safety uncertainty-estimation uncertainty-quantification

Summary

AI summary

Updates Highlights, https://arxiv.org/abs/2605.28500, and white-box across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Low	Adds CodeGenUQ for code generation uncertainty quantification. Adds CodeGenUQ for code generation uncertainty quantification. Source: llm_adapter@2026-06-01 Confidence: high	—
Feature	Low	Adds LiveCodeBench dataset loading and code execution evaluation. Adds LiveCodeBench dataset loading and code execution evaluation. Source: llm_adapter@2026-06-01 Confidence: high	—
Feature	Low	Adds Discord badge and community link to documentation. Adds Discord badge and community link to documentation. Source: llm_adapter@2026-06-01 Confidence: high	—
Feature	Low	Adds code generation links and BibTeX citation to documentation. Adds code generation links and BibTeX citation to documentation. Source: llm_adapter@2026-06-01 Confidence: high	—
Feature	Low	Releases minor version v0.6.0. Releases minor version v0.6.0. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Medium	Removes print statements and fixes edge‑case handling. Removes print statements and fixes edge‑case handling. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Low	Removes platform‑wide test skips. Removes platform‑wide test skips. Source: llm_adapter@2026-06-01 Confidence: high	—
Refactor	Low	Refactors evaluation code to accept list inputs and return dict results. Refactors evaluation code to accept list inputs and return dict results. Source: llm_adapter@2026-06-01 Confidence: high	—

Full changelog

Highlights

Add `CodeGenUQ` for Code Generation Uncertainty Quantification

As an extension of the work in Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification (Bouchard et. al, 2026), this minor release introduces CodeGenUQ, a unified interface for computing confidence scores that predict functional correctness in LLM-generated code without requiring execution or test cases at inference time. Under the hood, CodeGenUQ relies on new methods defined in uqlm.code subpackage. This release also creates a new demo notebook to illustrate this functionality.

Overview

CodeGenUQ extends ShortFormUQ to support code-specific uncertainty quantification across three scorer families:

Token-Probability Scorers (white-box)

Length-normalized sequence probability (LNSP)
Minimum token probability (MTP)
Probability margin (PM)
Average/minimum token negentropy (ATN@K, MTN@K)

Sampling-Based Consistency Scorers (black-box)

Functional equivalence methods: Replace NLI-based semantic comparison with LLM-based assessment of whether code snippets produce identical outputs for all valid inputs
- functional_equivalence_rate: Proportion of samples functionally equivalent to the original
- functional_negentropy: Entropy over functional equivalence clusters (code analogue of semantic entropy)
- functional_sets_confidence: Normalized count of unique functional clusters
Similarity methods:
- cosine_sim: Code embedding similarity (default: jina-embeddings-v2-base-code)
- codebleu: CodeBLEU consistency incorporating n-gram, syntax, and data-flow matching

Reflexive Scorers

p_true: Token probability assigned to "True" for self-evaluation
verbalized_confidence: Likert-scale confidence elicitation

Usage

from uqlm import CodeGenUQ

uq = CodeGenUQ(
    llm=my_llm,
    scorers=["functional_equivalence_rate", "cosine_sim", "sequence_probability"],
    language="python"
)
results = await uq.generate_and_score(prompts, num_responses=5)

What's Changed

CodeGenUQ : Code Generation tasks by @mohitcek in https://github.com/cvs-health/uqlm/pull/358
Unit tests for CodeGenUQ by @zeya30 in https://github.com/cvs-health/uqlm/pull/369
Patch/v0.5.8 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/373
Progress bars for CodeGenUQ by @zeya30 in https://github.com/cvs-health/uqlm/pull/376
Refactor: remove unused/redundant code by @zeya30 in https://github.com/cvs-health/uqlm/pull/377
Add LiveCodeBench dataset loading and code execution evaluation by @zeya30 in https://github.com/cvs-health/uqlm/pull/380
Code gen progress bar by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/379
Cleanup code gen by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/382
Add OSS quality improvements by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/381
Develop by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/383
v0.5.9 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/387
Merge develop -> code-gen-uq by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/388
UQ for Code Generation by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/389
Update docstring by @zeya30 in https://github.com/cvs-health/uqlm/pull/391
v0.5.10 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/393
Refactor evaluation code to accept list inputs and return dict results by @zeya30 in https://github.com/cvs-health/uqlm/pull/394
v0.5.11 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/396
Remove print and fix edge case handling by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/397
Remove platform-wide test skips by @mohitcek in https://github.com/cvs-health/uqlm/pull/399
Minor updates and improve code coverage for CodeGenUQ class by @mohitcek in https://github.com/cvs-health/uqlm/pull/403
Structured outputs by @aaronlohner in https://github.com/cvs-health/uqlm/pull/384
Add Discord badge and community link by @virenbajaj in https://github.com/cvs-health/uqlm/pull/404
add code gen links and bibtex by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/405
Minor release: v0.6.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/406

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.5.11...v0.6.0

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track UQLM

Get notified when new releases ship.

About UQLM

All releases →

UQLM

Summary

Changes in this release

Highlights

Add `CodeGenUQ` for Code Generation Uncertainty Quantification

Overview

Usage

What's Changed

Related context

Related tools

UQLM

Summary

Changes in this release

Highlights

Add CodeGenUQ for Code Generation Uncertainty Quantification

Overview

Usage

What's Changed

Related context

Related tools

Add `CodeGenUQ` for Code Generation Uncertainty Quantification