This release includes 1 breaking change for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+11 more
Affected surfaces
Summary
AI summaryBroad release touches Bugfixes and improvements, Parakeet tdt, ci, and New Model additions.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | High |
`text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text now expects full text embeddings instead of pooler outputs. `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text now expects full text embeddings instead of pooler outputs. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Medium |
Added Cohere2Moe Mixture-of-Experts language model with hybrid attention pattern. Added Cohere2Moe Mixture-of-Experts language model with hybrid attention pattern. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Low |
Added Parakeet tdt model. Added Parakeet tdt model. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
Added HRM-Text autoregressive language model variant of Hierarchical Reasoning Model. Added HRM-Text autoregressive language model variant of Hierarchical Reasoning Model. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
Added AudioFlamingoNext model checkpoints and improved audio/vision encoder compilability via standalone pure functions. Added AudioFlamingoNext model checkpoints and improved audio/vision encoder compilability via standalone pure functions. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
Added documentation for audio/video processors. Added documentation for audio/video processors. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
Added tensor parallelism support (CB). Added tensor parallelism support (CB). Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
Added initial torch_tpu backend support. Added initial torch_tpu backend support. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Medium |
Fixed Gemma4 generation handling of `inputs_embeds` and `per_layer_inputs`. Fixed Gemma4 generation handling of `inputs_embeds` and `per_layer_inputs`. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Medium |
Resolved AttributeError in RAG's `generate()` caused by missing config fields. Resolved AttributeError in RAG's `generate()` caused by missing config fields. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Medium |
Fixed GET /v1/models endpoint to return list instead of string for `owned_by` field. Fixed GET /v1/models endpoint to return list instead of string for `owned_by` field. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Low |
Fixed memory leaks caused by LRU decorators in vision models. Fixed memory leaks caused by LRU decorators in vision models. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Low |
Improved error messaging when loading audio from video files. Improved error messaging when loading audio from video files. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Low |
Blocked special image tokens during sampling to fix flaky VLM generation tests. Blocked special image tokens during sampling to fix flaky VLM generation tests. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Low |
Improved handling of Hubert models lacking `conv_pos_batch_norm`. Improved handling of Hubert models lacking `conv_pos_batch_norm`. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Refactor | Low |
Removed mask visualization tool from `masking_utils.py`. Removed mask visualization tool from `masking_utils.py`. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Refactor | Low |
Removed OpenTelemetry integration (CB). Removed OpenTelemetry integration (CB). Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Refactor | Low |
Reverted change from PR 45777. Reverted change from PR 45777. Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Other | Low |
fact_type fact_type Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Other | Low |
severity severity Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Other | Low |
text text Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Other | Low |
affected_surface affected_surface Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
Full changelog
Release v5.9.0
New Model additions
Cohere2Moe
Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences.
Links: Documentation
- Add new cohere2_moe model (#46115) by @Cyrilvallez in #46115
Parakeet tdt (#44171)
- Parakeet tdt (#44171) by @lmaksym
HRM-Text
HRM-Text is an improved autoregressive language-modeling variant of the Hierarchical Reasoning Model (HRM) that uses a hierarchical recurrent forward pass with two transformer stacks - one for slow, abstract planning (H) and one for fast, detailed computation (L) - reused inside a nested recurrence. It features PrefixLM attention where instruction tokens attend bidirectionally while response tokens attend causally, per-head sigmoid output gates, and parameterless RMSNorm. The model is designed as a base language model without instruction tuning or chat templates.
Links: Documentation | Paper
- Add hrm text (#46025) by @abcd1927 in #46025
Breaking changes
The text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs, aligning with other models in the library — users must update their inputs accordingly.
- 🚨Fix memory leaks caused by lru decorators in vision models (#45922) by @yonigozlan
Audio
Audio support was expanded with the addition of AudioFlamingoNext model checkpoints and improved compilability of audio/vision encoders via standalone pure functions. Additional improvements include better error messaging when loading audio from video files and new documentation for audio/video processors.
- user friendly error when loading audio from video (#45221) by @eustlb in [#45221]
- [docs] adding audio/video processors (#45795) by @stevhliu in [#45795]
- Support Audio Flamingo Next checkpoints (#44830) by @lashahub in [#44830]
- Extract dynamic vision/audio tensors into standalone pure functions (#45396) by @IlyasMoutawwakil in [#45396]
Generation
Fixed generation issues including inputs_embeds and per_layer_inputs handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, and flaky VLM generation tests by blocking special image tokens during sampling.
- Fix Gemma4 generation from inputs_embeds and per_layer_inputs (#46049) by @Cyrilvallez in [#46049]
- Fix AttributeError in RAG generate() for missing config fields (#46035) by @Sriniketh24 in [#46035]
- Block image_start/end_token_id in generation test sampling (#45914) by @Rocketknight1 in [#45914]
Bugfixes and improvements
- Remove mask visualization tool from
masking_utils.py(#46066) by @Cyrilvallez in [#46066] - fix: owned_by field in GET /v1/models returns list instead of string (#46006) by @nileshpatil6 in [#46006]
- [CB] Remove OpenTelemetry (#45984) by @remi-or in [#45984]
- docs(readme): use canonical
huggingface.codomain in prose links (#46042) by @kiwigitops in [#46042] - Fix remaining RAG doc examples that crash on current transformers (#46044) by @Sriniketh24 in [#46044]
- Init the actual tensor, not a copy (#46030) by @Rocketknight1 in [#46030]
- docs: sync legacy ACL anthology URLs and update metrics across i18n READMEs (#46027) by @irfaan101 in [#46027]
- [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029) by @eustlb in [#46029]
- [
HRM Text] Add integration tests (#46033) by @vasqu in [#46033] - hy_v3: add XPU expectations (#45858) by @kaixuanliu in [#45858]
- exaone4_5: add XPU expectations (#45890) by @kaixuanliu in [#45890]
- hyperclovax: add XPU Expectations for CI test (#45926) by @kaixuanliu in [#45926]
- chore(ci): remove dead env vars from circleci-failure-summary-comment.yml (#45972) by @XciD in [#45972]
- [CB] [Major] Add tensor paralellism (#45821) by @remi-or in [#45821]
- docs: update models architecture count and sync ACL anthology URLs (#46001) by @irfaan101 in [#46001]
- bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983) by @tarekziade in [#45983]
- RFDetr - use correct Roboflow org for release (#45946) by @sbucaille in [#45946]
- docs: Fix formatting issues in weightconverter.md (#45988) by @ArjunSrivastava1 in [#45988]
- Fix colqwen2 test (#45981) by @IlyasMoutawwakil in [#45981]
- Fix M-RoPE device mismatch in Qwen3VL family under FSDP2 CPU offload (#45861) by @jamesbraza in [#45861]
- [docs] chat template prefill (#45947) by @stevhliu in [#45947]
- [docs] decode fast path (#45899) by @stevhliu in [#45899]
- fix: restore
_attn_implementationand fix request offset ingenerate_batch()(#45943) by @sergiopaniego in [#45943] - Expose
per_layer_inputsfor every Gemma4 variants (#45927) by @Cyrilvallez in [#45927] - chore: update benchmark_v2.yml (#45966) by @hf-security-analysis[bot] in [#45966]
- fix(ci): set persist-credentials: false on actions/checkout and close remaining template injection findings (#45964) by @XciD in [#45964]
- chore(ci): set default workflow permissions to contents: read (#45961) by @XciD in [#45961]
- fix(ci): remove template injection on pull_request_target workflows (#45956) by @XciD in [#45956]
- chore(ci): pin all GitHub Actions and reusable workflows by SHA (#45955) by @XciD in [#45955]
- [docs] ALMModelTest (#45900) by @stevhliu in [#45900]
- Enhance apply_chat_template to support custom field prefilling (reasoning_content, thinking, etc.) (#45896) by @Mamiglia in [#45896]
- BUGFIX: Support hubert models that don't have conv_pos_batch_norm configured (#45921) by @igordertigor in [#45921]
- Revert 45777 (#45942) by @Rocketknight1 in [#45942]
- pass the otel secrets (#45933) by @tarekziade in [#45933]
- Add initial torch_tpu backend support (#45918) by @tengomucho in [#45918]
- [CB] Hide activation footprint by using the CUDA graph pool (#45911) by @remi-or in [#45911]
- Require input_ids for repetition penalty (#45389) by @ruben-aghayan in [#45389]
- Fix undefined 'input' variable (#45895) by @fullyz in [#45895]
- Fix post processing RF-DETR (#46041) by @yonigozlan (direct commit on v5.9.0)
- [loading] Free up tensors faster inside ConversionOps (#46110) by @Cyrilvallez (direct commit on v5.9.0)
- Add new cohere2_moe model (#46115) by @Cyrilvallez (direct commit on v5.9.0)
- Fix cohere2 tp_plan for release by @Cyrilvallez (direct commit on v5.9.0)
- Release v5.9.0 by @Cyrilvallez (direct commit on v5.9.0)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @lmaksym
- Parakeet tdt (#44171)
- @eustlb
- user friendly error when loading audio from video (#45221)
- [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029)
- @remi-or
- [CB] Remove OpenTelemetry (#45984)
- [CB] [Major] Add tensor paralellism (#45821)
- [CB] Hide activation footprint by using the CUDA graph pool (#45911)
- @abcd1927
- Add hrm text (#46025)
Breaking Changes
- The `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Transformers
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]