Skip to content

UQLM

v0.6.0 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-evaluation ai-safety confidence-estimation confidence-score hallucination hallucination-detection
+8 more
hallucination-evaluation hallucination-mitigation llm llm-evaluation llm-hallucination llm-safety uncertainty-estimation uncertainty-quantification

Summary

AI summary

Updates Highlights, https://arxiv.org/abs/2605.28500, and white-box across a mixed release.

Changes in this release

Feature Low

Adds CodeGenUQ for code generation uncertainty quantification.

Adds CodeGenUQ for code generation uncertainty quantification.

Source: llm_adapter@2026-06-01

Confidence: high

Feature Low

Adds LiveCodeBench dataset loading and code execution evaluation.

Adds LiveCodeBench dataset loading and code execution evaluation.

Source: llm_adapter@2026-06-01

Confidence: high

Feature Low

Adds Discord badge and community link to documentation.

Adds Discord badge and community link to documentation.

Source: llm_adapter@2026-06-01

Confidence: high

Feature Low

Adds code generation links and BibTeX citation to documentation.

Adds code generation links and BibTeX citation to documentation.

Source: llm_adapter@2026-06-01

Confidence: high

Feature Low

Releases minor version v0.6.0.

Releases minor version v0.6.0.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

Removes print statements and fixes edge‑case handling.

Removes print statements and fixes edge‑case handling.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Low

Removes platform‑wide test skips.

Removes platform‑wide test skips.

Source: llm_adapter@2026-06-01

Confidence: high

Refactor Low

Refactors evaluation code to accept list inputs and return dict results.

Refactors evaluation code to accept list inputs and return dict results.

Source: llm_adapter@2026-06-01

Confidence: high

Full changelog

Highlights

Add CodeGenUQ for Code Generation Uncertainty Quantification

As an extension of the work in Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification (Bouchard et. al, 2026), this minor release introduces CodeGenUQ, a unified interface for computing confidence scores that predict functional correctness in LLM-generated code without requiring execution or test cases at inference time. Under the hood, CodeGenUQ relies on new methods defined in uqlm.code subpackage. This release also creates a new demo notebook to illustrate this functionality.

Overview

CodeGenUQ extends ShortFormUQ to support code-specific uncertainty quantification across three scorer families:

Token-Probability Scorers (white-box)

  • Length-normalized sequence probability (LNSP)
  • Minimum token probability (MTP)
  • Probability margin (PM)
  • Average/minimum token negentropy (ATN@K, MTN@K)

Sampling-Based Consistency Scorers (black-box)

  • Functional equivalence methods: Replace NLI-based semantic comparison with LLM-based assessment of whether code snippets produce identical outputs for all valid inputs
    • functional_equivalence_rate: Proportion of samples functionally equivalent to the original
    • functional_negentropy: Entropy over functional equivalence clusters (code analogue of semantic entropy)
    • functional_sets_confidence: Normalized count of unique functional clusters
  • Similarity methods:
    • cosine_sim: Code embedding similarity (default: jina-embeddings-v2-base-code)
    • codebleu: CodeBLEU consistency incorporating n-gram, syntax, and data-flow matching

Reflexive Scorers

  • p_true: Token probability assigned to "True" for self-evaluation
  • verbalized_confidence: Likert-scale confidence elicitation

Usage

from uqlm import CodeGenUQ

uq = CodeGenUQ(
    llm=my_llm,
    scorers=["functional_equivalence_rate", "cosine_sim", "sequence_probability"],
    language="python"
)
results = await uq.generate_and_score(prompts, num_responses=5)

What's Changed

  • CodeGenUQ : Code Generation tasks by @mohitcek in https://github.com/cvs-health/uqlm/pull/358
  • Unit tests for CodeGenUQ by @zeya30 in https://github.com/cvs-health/uqlm/pull/369
  • Patch/v0.5.8 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/373
  • Progress bars for CodeGenUQ by @zeya30 in https://github.com/cvs-health/uqlm/pull/376
  • Refactor: remove unused/redundant code by @zeya30 in https://github.com/cvs-health/uqlm/pull/377
  • Add LiveCodeBench dataset loading and code execution evaluation by @zeya30 in https://github.com/cvs-health/uqlm/pull/380
  • Code gen progress bar by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/379
  • Cleanup code gen by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/382
  • Add OSS quality improvements by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/381
  • Develop by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/383
  • v0.5.9 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/387
  • Merge develop -> code-gen-uq by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/388
  • UQ for Code Generation by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/389
  • Update docstring by @zeya30 in https://github.com/cvs-health/uqlm/pull/391
  • v0.5.10 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/393
  • Refactor evaluation code to accept list inputs and return dict results by @zeya30 in https://github.com/cvs-health/uqlm/pull/394
  • v0.5.11 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/396
  • Remove print and fix edge case handling by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/397
  • Remove platform-wide test skips by @mohitcek in https://github.com/cvs-health/uqlm/pull/399
  • Minor updates and improve code coverage for CodeGenUQ class by @mohitcek in https://github.com/cvs-health/uqlm/pull/403
  • Structured outputs by @aaronlohner in https://github.com/cvs-health/uqlm/pull/384
  • Add Discord badge and community link by @virenbajaj in https://github.com/cvs-health/uqlm/pull/404
  • add code gen links and bibtex by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/405
  • Minor release: v0.6.0 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/406

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.5.11...v0.6.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track UQLM

Get notified when new releases ship.

Sign up free

Related context

Beta — feedback welcome: [email protected]