Transformers

v5.9.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

audio machine-learning deepseek gemma glm llm

+11 more

model-hub natural-language-processing nlp pretrained-models python pytorch pytorch-transformers qwen speech-recognition transformer vlm

Affected surfaces

breaking_upgrade

Summary

AI summary

Broad release touches Bugfixes and improvements, Parakeet tdt, ci, and New Model additions.

Changes in this release

Type	Severity	Summary	CVE
Breaking	High	`text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text now expects full text embeddings instead of pooler outputs. `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text now expects full text embeddings instead of pooler outputs. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature
Feature	Medium	Added Cohere2Moe Mixture-of-Experts language model with hybrid attention pattern. Added Cohere2Moe Mixture-of-Experts language model with hybrid attention pattern. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Low	Added Parakeet tdt model. Added Parakeet tdt model. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	Added HRM-Text autoregressive language model variant of Hierarchical Reasoning Model. Added HRM-Text autoregressive language model variant of Hierarchical Reasoning Model. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	Added AudioFlamingoNext model checkpoints and improved audio/vision encoder compilability via standalone pure functions. Added AudioFlamingoNext model checkpoints and improved audio/vision encoder compilability via standalone pure functions. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	Added documentation for audio/video processors. Added documentation for audio/video processors. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	Added tensor parallelism support (CB). Added tensor parallelism support (CB). Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	Added initial torch_tpu backend support. Added initial torch_tpu backend support. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix
Bugfix	Medium	Fixed Gemma4 generation handling of `inputs_embeds` and `per_layer_inputs`. Fixed Gemma4 generation handling of `inputs_embeds` and `per_layer_inputs`. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Medium	Resolved AttributeError in RAG's `generate()` caused by missing config fields. Resolved AttributeError in RAG's `generate()` caused by missing config fields. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Medium	Fixed GET /v1/models endpoint to return list instead of string for `owned_by` field. Fixed GET /v1/models endpoint to return list instead of string for `owned_by` field. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Low	Fixed memory leaks caused by LRU decorators in vision models. Fixed memory leaks caused by LRU decorators in vision models. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Low	Improved error messaging when loading audio from video files. Improved error messaging when loading audio from video files. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Low	Blocked special image tokens during sampling to fix flaky VLM generation tests. Blocked special image tokens during sampling to fix flaky VLM generation tests. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Low	Improved handling of Hubert models lacking `conv_pos_batch_norm`. Improved handling of Hubert models lacking `conv_pos_batch_norm`. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Refactor
Refactor	Low	Removed mask visualization tool from `masking_utils.py`. Removed mask visualization tool from `masking_utils.py`. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Refactor	Low	Removed OpenTelemetry integration (CB). Removed OpenTelemetry integration (CB). Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Refactor	Low	Reverted change from PR 45777. Reverted change from PR 45777. Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Other
Other	Low	fact_type fact_type Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Other	Low	severity severity Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Other	Low	text text Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Other	Low	affected_surface affected_surface Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—

Full changelog

Release v5.9.0

New Model additions

Cohere2Moe

Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences.

Links: Documentation

Add new cohere2_moe model (#46115) by @Cyrilvallez in #46115

Parakeet tdt (#44171)

Parakeet tdt (#44171) by @lmaksym

HRM-Text

HRM-Text is an improved autoregressive language-modeling variant of the Hierarchical Reasoning Model (HRM) that uses a hierarchical recurrent forward pass with two transformer stacks - one for slow, abstract planning (H) and one for fast, detailed computation (L) - reused inside a nested recurrence. It features PrefixLM attention where instruction tokens attend bidirectionally while response tokens attend causally, per-head sigmoid output gates, and parameterless RMSNorm. The model is designed as a base language model without instruction tuning or chat templates.

Links: Documentation | Paper

Add hrm text (#46025) by @abcd1927 in #46025

Breaking changes

The text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs, aligning with other models in the library — users must update their inputs accordingly.

🚨Fix memory leaks caused by lru decorators in vision models (#45922) by @yonigozlan

Audio

Audio support was expanded with the addition of AudioFlamingoNext model checkpoints and improved compilability of audio/vision encoders via standalone pure functions. Additional improvements include better error messaging when loading audio from video files and new documentation for audio/video processors.

user friendly error when loading audio from video (#45221) by @eustlb in [#45221]
[docs] adding audio/video processors (#45795) by @stevhliu in [#45795]
Support Audio Flamingo Next checkpoints (#44830) by @lashahub in [#44830]
Extract dynamic vision/audio tensors into standalone pure functions (#45396) by @IlyasMoutawwakil in [#45396]

Generation

Fixed generation issues including inputs_embeds and per_layer_inputs handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, and flaky VLM generation tests by blocking special image tokens during sampling.

Fix Gemma4 generation from inputs_embeds and per_layer_inputs (#46049) by @Cyrilvallez in [#46049]
Fix AttributeError in RAG generate() for missing config fields (#46035) by @Sriniketh24 in [#46035]
Block image_start/end_token_id in generation test sampling (#45914) by @Rocketknight1 in [#45914]

Bugfixes and improvements

Remove mask visualization tool from masking_utils.py (#46066) by @Cyrilvallez in [#46066]
fix: owned_by field in GET /v1/models returns list instead of string (#46006) by @nileshpatil6 in [#46006]
[CB] Remove OpenTelemetry (#45984) by @remi-or in [#45984]
docs(readme): use canonical huggingface.co domain in prose links (#46042) by @kiwigitops in [#46042]
Fix remaining RAG doc examples that crash on current transformers (#46044) by @Sriniketh24 in [#46044]
Init the actual tensor, not a copy (#46030) by @Rocketknight1 in [#46030]
docs: sync legacy ACL anthology URLs and update metrics across i18n READMEs (#46027) by @irfaan101 in [#46027]
[MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029) by @eustlb in [#46029]
[HRM Text] Add integration tests (#46033) by @vasqu in [#46033]
hy_v3: add XPU expectations (#45858) by @kaixuanliu in [#45858]
exaone4_5: add XPU expectations (#45890) by @kaixuanliu in [#45890]
hyperclovax: add XPU Expectations for CI test (#45926) by @kaixuanliu in [#45926]
chore(ci): remove dead env vars from circleci-failure-summary-comment.yml (#45972) by @XciD in [#45972]
[CB] [Major] Add tensor paralellism (#45821) by @remi-or in [#45821]
docs: update models architecture count and sync ACL anthology URLs (#46001) by @irfaan101 in [#46001]
bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983) by @tarekziade in [#45983]
RFDetr - use correct Roboflow org for release (#45946) by @sbucaille in [#45946]
docs: Fix formatting issues in weightconverter.md (#45988) by @ArjunSrivastava1 in [#45988]
Fix colqwen2 test (#45981) by @IlyasMoutawwakil in [#45981]
Fix M-RoPE device mismatch in Qwen3VL family under FSDP2 CPU offload (#45861) by @jamesbraza in [#45861]
[docs] chat template prefill (#45947) by @stevhliu in [#45947]
[docs] decode fast path (#45899) by @stevhliu in [#45899]
fix: restore _attn_implementation and fix request offset in generate_batch() (#45943) by @sergiopaniego in [#45943]
Expose per_layer_inputs for every Gemma4 variants (#45927) by @Cyrilvallez in [#45927]
chore: update benchmark_v2.yml (#45966) by @hf-security-analysis[bot] in [#45966]
fix(ci): set persist-credentials: false on actions/checkout and close remaining template injection findings (#45964) by @XciD in [#45964]
chore(ci): set default workflow permissions to contents: read (#45961) by @XciD in [#45961]
fix(ci): remove template injection on pull_request_target workflows (#45956) by @XciD in [#45956]
chore(ci): pin all GitHub Actions and reusable workflows by SHA (#45955) by @XciD in [#45955]
[docs] ALMModelTest (#45900) by @stevhliu in [#45900]
Enhance apply_chat_template to support custom field prefilling (reasoning_content, thinking, etc.) (#45896) by @Mamiglia in [#45896]
BUGFIX: Support hubert models that don't have conv_pos_batch_norm configured (#45921) by @igordertigor in [#45921]
Revert 45777 (#45942) by @Rocketknight1 in [#45942]
pass the otel secrets (#45933) by @tarekziade in [#45933]
Add initial torch_tpu backend support (#45918) by @tengomucho in [#45918]
[CB] Hide activation footprint by using the CUDA graph pool (#45911) by @remi-or in [#45911]
Require input_ids for repetition penalty (#45389) by @ruben-aghayan in [#45389]
Fix undefined 'input' variable (#45895) by @fullyz in [#45895]
Fix post processing RF-DETR (#46041) by @yonigozlan (direct commit on v5.9.0)
[loading] Free up tensors faster inside ConversionOps (#46110) by @Cyrilvallez (direct commit on v5.9.0)
Add new cohere2_moe model (#46115) by @Cyrilvallez (direct commit on v5.9.0)
Fix cohere2 tp_plan for release by @Cyrilvallez (direct commit on v5.9.0)
Release v5.9.0 by @Cyrilvallez (direct commit on v5.9.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@lmaksym
- Parakeet tdt (#44171)
@eustlb
- user friendly error when loading audio from video (#45221)
- [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029)
@remi-or
- [CB] Remove OpenTelemetry (#45984)
- [CB] [Major] Add tensor paralellism (#45821)
- [CB] Hide activation footprint by using the CUDA graph pool (#45911)
@abcd1927
- Add hrm text (#46025)

Breaking Changes

The `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Transformers

Get notified when new releases ship.

About Transformers

All releases →