Skip to content

UQLM

v0.3.0 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-evaluation ai-safety confidence-estimation confidence-score hallucination hallucination-detection
+8 more
hallucination-evaluation hallucination-mitigation llm llm-evaluation llm-hallucination llm-safety uncertainty-estimation uncertainty-quantification

Summary

AI summary

Updates 6. Bug Fixes, 4. Benchmark Dataset Extension, and 3. LLM Judge explanations across a mixed release.

Full changelog

1. Dataset-specific confidence score calibration

  • Introduced the new ScoreCalibrator class for calibrating confidence scores on specific datasets (Platt or Isotonic)
  • Includes evaluate_calibration function for evaluating score calibration with plots and various metrics, including ECE, MCE, Brier Score, Calibration Gap, and log-loss
  • For a detailed walkthrough of this feature, please refer to the demo notebook

2. Enabled use of LangChain BaseMessage with prompts argument

  • Added support for List[List[BaseMessage]] alongside the existing List[str] format for prompts argument of generate_and_score method in the following classes:
    • UQEnsemble
    • BlackBoxUQ
    • WhiteBoxUQ
    • SemanticEntropy
  • This enhancement enables uncertainty quantification and hallucination detection with:
    • Multimodal inputs (e.g. image)
    • Chat history
    • Various message types (HumanMessage, AIMessage, SystemMessage)
  • Note: This feature is currently in Beta and is not compatible with LLM judges (LLMPanel or judge components of UQEnsemble)
  • For a detailed walkthrough of this feature, please refer to the demo notebook

3. LLM Judge explanations

  • Enhanced the LLMPanel class to provide explanations alongside scores
  • Judges can now justify their evaluations with detailed reasoning
  • Specified with boolean parameter explanations

4. Benchmark Dataset Extension

  • Added support for the FactScore benchmark dataset via the load_example_dataset function
  • Enables evaluation of long-form question answering capabilities in LLMs

5. Updated utility plotting functions

  • Added plot_ranked_auc option to compute AUPRC (rather then current AUROC only) and rank them in a color-coded bar plot (as seen in our research paper). Added missing legend to this function.

6. Bug Fixes

  • Fixed the LiveError issue that occurred with rich progress bars when retrying after code interruption
  • Removed unused images for docs site
  • Added missing unit tests for utility plotting functions
  • Updated demo notebooks to use non-deprecated LLMs (gemini-1.5-flash -> gemini-2.5-flash)

What's Changed

  • Add score calibration by @jmabry in https://github.com/cvs-health/uqlm/pull/147
  • v0.2.7 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/171
  • Feat: Integrate ScoreCalibration class to existing structure by @mohitcek in https://github.com/cvs-health/uqlm/pull/165
  • Confidence score calibration by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/181
  • Enable UQ with multimodal inputs by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/182
  • Bump sphinx from 7.3.7 to 7.4.7 by @dependabot[bot] in https://github.com/cvs-health/uqlm/pull/177
  • Removing unused images and set correct switcher json url by @doyajii1 in https://github.com/cvs-health/uqlm/pull/184
  • update URLs in README to use main branch by @vgyani in https://github.com/cvs-health/uqlm/pull/187
  • Removed a typo from black_box_demo.ipynb by @kaushik-42 in https://github.com/cvs-health/uqlm/pull/188
  • Update plot_ranked_auc by @zeya30 in https://github.com/cvs-health/uqlm/pull/183
  • Enable explanations with LLM judge scores by @NamrataWalanj7 in https://github.com/cvs-health/uqlm/pull/178
  • fix continuous judge output handling by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/189
  • Adding factscore dataset by @dskarbrevik in https://github.com/cvs-health/uqlm/pull/191
  • Minor release: v0.3 by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/192

New Contributors

  • @jmabry made their first contribution in https://github.com/cvs-health/uqlm/pull/147
  • @vgyani made their first contribution in https://github.com/cvs-health/uqlm/pull/187
  • @kaushik-42 made their first contribution in https://github.com/cvs-health/uqlm/pull/188

Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.7...v0.3.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track UQLM

Get notified when new releases ship.

Sign up free

Related context

Beta — feedback welcome: [email protected]