This release includes breaking changes for platform teams planning a safe upgrade.
Published 8mo
AI Agents & Assistants
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
ai-evaluation
ai-safety
confidence-estimation
confidence-score
hallucination
hallucination-detection
+8 more
hallucination-evaluation
hallucination-mitigation
llm
llm-evaluation
llm-hallucination
llm-safety
uncertainty-estimation
uncertainty-quantification
Summary
AI summaryUpdates 6. Bug Fixes, 4. Benchmark Dataset Extension, and 3. LLM Judge explanations across a mixed release.
Full changelog
1. Dataset-specific confidence score calibration
- Introduced the new
ScoreCalibratorclass for calibrating confidence scores on specific datasets (Platt or Isotonic) - Includes
evaluate_calibrationfunction for evaluating score calibration with plots and various metrics, including ECE, MCE, Brier Score, Calibration Gap, and log-loss - For a detailed walkthrough of this feature, please refer to the demo notebook
2. Enabled use of LangChain BaseMessage with prompts argument
- Added support for
List[List[BaseMessage]]alongside the existingList[str]format forpromptsargument ofgenerate_and_scoremethod in the following classes:UQEnsembleBlackBoxUQWhiteBoxUQSemanticEntropy
- This enhancement enables uncertainty quantification and hallucination detection with:
- Multimodal inputs (e.g. image)
- Chat history
- Various message types (HumanMessage, AIMessage, SystemMessage)
- Note: This feature is currently in Beta and is not compatible with LLM judges (LLMPanel or judge components of UQEnsemble)
- For a detailed walkthrough of this feature, please refer to the demo notebook
3. LLM Judge explanations
- Enhanced the LLMPanel class to provide explanations alongside scores
- Judges can now justify their evaluations with detailed reasoning
- Specified with boolean parameter
explanations
4. Benchmark Dataset Extension
- Added support for the FactScore benchmark dataset via the
load_example_datasetfunction - Enables evaluation of long-form question answering capabilities in LLMs
5. Updated utility plotting functions
- Added
plot_ranked_aucoption to compute AUPRC (rather then current AUROC only) and rank them in a color-coded bar plot (as seen in our research paper). Added missing legend to this function.
6. Bug Fixes
- Fixed the
LiveErrorissue that occurred with rich progress bars when retrying after code interruption - Removed unused images for docs site
- Added missing unit tests for utility plotting functions
- Updated demo notebooks to use non-deprecated LLMs (
gemini-1.5-flash->gemini-2.5-flash)
What's Changed
- Add score calibration by @jmabry in https://github.com/cvs-health/uqlm/pull/147
- v0.2.7 updates by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/171
- Feat: Integrate ScoreCalibration class to existing structure by @mohitcek in https://github.com/cvs-health/uqlm/pull/165
- Confidence score calibration by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/181
- Enable UQ with multimodal inputs by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/182
- Bump sphinx from 7.3.7 to 7.4.7 by @dependabot[bot] in https://github.com/cvs-health/uqlm/pull/177
- Removing unused images and set correct switcher json url by @doyajii1 in https://github.com/cvs-health/uqlm/pull/184
- update URLs in README to use main branch by @vgyani in https://github.com/cvs-health/uqlm/pull/187
- Removed a typo from black_box_demo.ipynb by @kaushik-42 in https://github.com/cvs-health/uqlm/pull/188
- Update plot_ranked_auc by @zeya30 in https://github.com/cvs-health/uqlm/pull/183
- Enable explanations with LLM judge scores by @NamrataWalanj7 in https://github.com/cvs-health/uqlm/pull/178
- fix continuous judge output handling by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/189
- Adding factscore dataset by @dskarbrevik in https://github.com/cvs-health/uqlm/pull/191
- Minor release:
v0.3by @dylanbouchard in https://github.com/cvs-health/uqlm/pull/192
New Contributors
- @jmabry made their first contribution in https://github.com/cvs-health/uqlm/pull/147
- @vgyani made their first contribution in https://github.com/cvs-health/uqlm/pull/187
- @kaushik-42 made their first contribution in https://github.com/cvs-health/uqlm/pull/188
Full Changelog: https://github.com/cvs-health/uqlm/compare/v0.2.7...v0.3.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About UQLM
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]