Skip to content

Transformers

v5.9.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 14d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

audio machine-learning deepseek gemma glm llm
+11 more
model-hub natural-language-processing nlp pretrained-models python pytorch pytorch-transformers qwen speech-recognition transformer vlm

Affected surfaces

breaking_upgrade

Summary

AI summary

Broad release touches Bugfixes and improvements, Parakeet tdt, ci, and New Model additions.

Changes in this release

Breaking High

`text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text now expects full text embeddings instead of pooler outputs.

`text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text now expects full text embeddings instead of pooler outputs.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Medium

Added Cohere2Moe Mixture-of-Experts language model with hybrid attention pattern.

Added Cohere2Moe Mixture-of-Experts language model with hybrid attention pattern.

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: high

Feature Low

Added Parakeet tdt model.

Added Parakeet tdt model.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Low

Added HRM-Text autoregressive language model variant of Hierarchical Reasoning Model.

Added HRM-Text autoregressive language model variant of Hierarchical Reasoning Model.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Low

Added AudioFlamingoNext model checkpoints and improved audio/vision encoder compilability via standalone pure functions.

Added AudioFlamingoNext model checkpoints and improved audio/vision encoder compilability via standalone pure functions.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Low

Added documentation for audio/video processors.

Added documentation for audio/video processors.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Low

Added tensor parallelism support (CB).

Added tensor parallelism support (CB).

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Feature Low

Added initial torch_tpu backend support.

Added initial torch_tpu backend support.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Medium

Fixed Gemma4 generation handling of `inputs_embeds` and `per_layer_inputs`.

Fixed Gemma4 generation handling of `inputs_embeds` and `per_layer_inputs`.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Medium

Resolved AttributeError in RAG's `generate()` caused by missing config fields.

Resolved AttributeError in RAG's `generate()` caused by missing config fields.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Medium

Fixed GET /v1/models endpoint to return list instead of string for `owned_by` field.

Fixed GET /v1/models endpoint to return list instead of string for `owned_by` field.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Low

Fixed memory leaks caused by LRU decorators in vision models.

Fixed memory leaks caused by LRU decorators in vision models.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Low

Improved error messaging when loading audio from video files.

Improved error messaging when loading audio from video files.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Low

Blocked special image tokens during sampling to fix flaky VLM generation tests.

Blocked special image tokens during sampling to fix flaky VLM generation tests.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Bugfix Low

Improved handling of Hubert models lacking `conv_pos_batch_norm`.

Improved handling of Hubert models lacking `conv_pos_batch_norm`.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Refactor Low

Removed mask visualization tool from `masking_utils.py`.

Removed mask visualization tool from `masking_utils.py`.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Refactor Low

Removed OpenTelemetry integration (CB).

Removed OpenTelemetry integration (CB).

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Refactor Low

Reverted change from PR 45777.

Reverted change from PR 45777.

Source: granite4.1:30b@2026-05-20-audit

Confidence: low

Other Low

fact_type

fact_type

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Other Low

severity

severity

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Other Low

text

text

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Other Low

affected_surface

affected_surface

Source: granite4.1:8b-q6_K@2026-05-20

Confidence: low

Full changelog

Release v5.9.0

New Model additions

Cohere2Moe

Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences.

Links: Documentation

  • Add new cohere2_moe model (#46115) by @Cyrilvallez in #46115

Parakeet tdt (#44171)

  • Parakeet tdt (#44171) by @lmaksym

HRM-Text

HRM-Text is an improved autoregressive language-modeling variant of the Hierarchical Reasoning Model (HRM) that uses a hierarchical recurrent forward pass with two transformer stacks - one for slow, abstract planning (H) and one for fast, detailed computation (L) - reused inside a nested recurrence. It features PrefixLM attention where instruction tokens attend bidirectionally while response tokens attend causally, per-head sigmoid output gates, and parameterless RMSNorm. The model is designed as a base language model without instruction tuning or chat templates.

Links: Documentation | Paper

  • Add hrm text (#46025) by @abcd1927 in #46025

Breaking changes

The text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs, aligning with other models in the library — users must update their inputs accordingly.

  • 🚨Fix memory leaks caused by lru decorators in vision models (#45922) by @yonigozlan

Audio

Audio support was expanded with the addition of AudioFlamingoNext model checkpoints and improved compilability of audio/vision encoders via standalone pure functions. Additional improvements include better error messaging when loading audio from video files and new documentation for audio/video processors.

  • user friendly error when loading audio from video (#45221) by @eustlb in [#45221]
  • [docs] adding audio/video processors (#45795) by @stevhliu in [#45795]
  • Support Audio Flamingo Next checkpoints (#44830) by @lashahub in [#44830]
  • Extract dynamic vision/audio tensors into standalone pure functions (#45396) by @IlyasMoutawwakil in [#45396]

Generation

Fixed generation issues including inputs_embeds and per_layer_inputs handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, and flaky VLM generation tests by blocking special image tokens during sampling.

  • Fix Gemma4 generation from inputs_embeds and per_layer_inputs (#46049) by @Cyrilvallez in [#46049]
  • Fix AttributeError in RAG generate() for missing config fields (#46035) by @Sriniketh24 in [#46035]
  • Block image_start/end_token_id in generation test sampling (#45914) by @Rocketknight1 in [#45914]

Bugfixes and improvements

  • Remove mask visualization tool from masking_utils.py (#46066) by @Cyrilvallez in [#46066]
  • fix: owned_by field in GET /v1/models returns list instead of string (#46006) by @nileshpatil6 in [#46006]
  • [CB] Remove OpenTelemetry (#45984) by @remi-or in [#45984]
  • docs(readme): use canonical huggingface.co domain in prose links (#46042) by @kiwigitops in [#46042]
  • Fix remaining RAG doc examples that crash on current transformers (#46044) by @Sriniketh24 in [#46044]
  • Init the actual tensor, not a copy (#46030) by @Rocketknight1 in [#46030]
  • docs: sync legacy ACL anthology URLs and update metrics across i18n READMEs (#46027) by @irfaan101 in [#46027]
  • [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029) by @eustlb in [#46029]
  • [HRM Text] Add integration tests (#46033) by @vasqu in [#46033]
  • hy_v3: add XPU expectations (#45858) by @kaixuanliu in [#45858]
  • exaone4_5: add XPU expectations (#45890) by @kaixuanliu in [#45890]
  • hyperclovax: add XPU Expectations for CI test (#45926) by @kaixuanliu in [#45926]
  • chore(ci): remove dead env vars from circleci-failure-summary-comment.yml (#45972) by @XciD in [#45972]
  • [CB] [Major] Add tensor paralellism (#45821) by @remi-or in [#45821]
  • docs: update models architecture count and sync ACL anthology URLs (#46001) by @irfaan101 in [#46001]
  • bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983) by @tarekziade in [#45983]
  • RFDetr - use correct Roboflow org for release (#45946) by @sbucaille in [#45946]
  • docs: Fix formatting issues in weightconverter.md (#45988) by @ArjunSrivastava1 in [#45988]
  • Fix colqwen2 test (#45981) by @IlyasMoutawwakil in [#45981]
  • Fix M-RoPE device mismatch in Qwen3VL family under FSDP2 CPU offload (#45861) by @jamesbraza in [#45861]
  • [docs] chat template prefill (#45947) by @stevhliu in [#45947]
  • [docs] decode fast path (#45899) by @stevhliu in [#45899]
  • fix: restore _attn_implementation and fix request offset in generate_batch() (#45943) by @sergiopaniego in [#45943]
  • Expose per_layer_inputs for every Gemma4 variants (#45927) by @Cyrilvallez in [#45927]
  • chore: update benchmark_v2.yml (#45966) by @hf-security-analysis[bot] in [#45966]
  • fix(ci): set persist-credentials: false on actions/checkout and close remaining template injection findings (#45964) by @XciD in [#45964]
  • chore(ci): set default workflow permissions to contents: read (#45961) by @XciD in [#45961]
  • fix(ci): remove template injection on pull_request_target workflows (#45956) by @XciD in [#45956]
  • chore(ci): pin all GitHub Actions and reusable workflows by SHA (#45955) by @XciD in [#45955]
  • [docs] ALMModelTest (#45900) by @stevhliu in [#45900]
  • Enhance apply_chat_template to support custom field prefilling (reasoning_content, thinking, etc.) (#45896) by @Mamiglia in [#45896]
  • BUGFIX: Support hubert models that don't have conv_pos_batch_norm configured (#45921) by @igordertigor in [#45921]
  • Revert 45777 (#45942) by @Rocketknight1 in [#45942]
  • pass the otel secrets (#45933) by @tarekziade in [#45933]
  • Add initial torch_tpu backend support (#45918) by @tengomucho in [#45918]
  • [CB] Hide activation footprint by using the CUDA graph pool (#45911) by @remi-or in [#45911]
  • Require input_ids for repetition penalty (#45389) by @ruben-aghayan in [#45389]
  • Fix undefined 'input' variable (#45895) by @fullyz in [#45895]
  • Fix post processing RF-DETR (#46041) by @yonigozlan (direct commit on v5.9.0)
  • [loading] Free up tensors faster inside ConversionOps (#46110) by @Cyrilvallez (direct commit on v5.9.0)
  • Add new cohere2_moe model (#46115) by @Cyrilvallez (direct commit on v5.9.0)
  • Fix cohere2 tp_plan for release by @Cyrilvallez (direct commit on v5.9.0)
  • Release v5.9.0 by @Cyrilvallez (direct commit on v5.9.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @lmaksym
    • Parakeet tdt (#44171)
  • @eustlb
    • user friendly error when loading audio from video (#45221)
    • [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029)
  • @remi-or
    • [CB] Remove OpenTelemetry (#45984)
    • [CB] [Major] Add tensor paralellism (#45821)
    • [CB] Hide activation footprint by using the CUDA graph pool (#45911)
  • @abcd1927
    • Add hrm text (#46025)

Breaking Changes

  • The `text_embeds` input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Transformers

Get notified when new releases ship.

Sign up free

About Transformers

All releases →

Related context

Beta — feedback welcome: [email protected]