haystack

v2.29.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent agents ai gemini generative-ai gpt-4

+13 more

information-retrieval large-language-models llm machine-learning nlp orchestration python pytorch question-answering retrieval-augmented-generation semantic-search summarization transformers

ReleasePort's take

Moderate signal

editorial:auto 2mo

Haystack v2.29.0 breaks LLM.run and LLM.run_async, requiring keyword arguments for messages and streaming_callback. The release adds MultiRetriever for parallel retrieval execution and CacheChecker async support.

Why it matters: Apps calling LLM.run/run_async with positional arguments must migrate to keyword syntax before upgrading. No auto-conversion path. Test in dev and plan the upgrade accordingly.

Summary

AI summary

LLM.run and LLM.run_async now require keyword arguments for messages and streaming_callback.

Changes in this release

Type	Severity	Summary	CVE
Breaking	Medium	`LLM.run` and `LLM.run_async` now require keyword arguments for `messages` and `streaming_callback`. `LLM.run` and `LLM.run_async` now require keyword arguments for `messages` and `streaming_callback`. Source: llm_adapter@2026-05-21 Confidence: low	—
Feature
Feature	Medium	MultiRetriever runs multiple text retrievers in parallel and merges results. MultiRetriever runs multiple text retrievers in parallel and merges results. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	TextEmbeddingRetriever wraps embedding-based retriever with embedder for MultiRetriever compatibility. TextEmbeddingRetriever wraps embedding-based retriever with embedder for MultiRetriever compatibility. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	`CacheChecker.run_async` enables use in AsyncPipeline without blocking event loop. `CacheChecker.run_async` enables use in AsyncPipeline without blocking event loop. Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	`MultiRetriever` now supports `join_mode` parameter with ` `MultiRetriever` now supports `join_mode` parameter with ` Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	CacheChecker gains a `run_async` method to be used in AsyncPipeline without blocking. CacheChecker gains a `run_async` method to be used in AsyncPipeline without blocking. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—
Feature	Medium	MultiRetriever adds a `join_mode` parameter supporting "reciprocal_rank_fusion" (default) and "concatenate". MultiRetriever adds a `join_mode` parameter supporting "reciprocal_rank_fusion" (default) and "concatenate". Source: granite4.1:30b@2026-05-23-audit Confidence: low	—
Bugfix
Bugfix	Medium	NamedEntityExtractor restores spaCy/Thinc device state correctly after execution. NamedEntityExtractor restores spaCy/Thinc device state correctly after execution. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—
Bugfix	Medium	OpenAIChatGenerator's `tools_strict=True` now recursively applies schema strictness to nested tool parameters. OpenAIChatGenerator's `tools_strict=True` now recursively applies schema strictness to nested tool parameters. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—
Bugfix	Low	Preserve resumable snapshots when some inputs or outputs are non-serializable, omitting only failing fields. Preserve resumable snapshots when some inputs or outputs are non-serializable, omitting only failing fields. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—
Refactor	Medium	Documented input ordering behavior of auto-promoted lazy variadic sockets in `Pipeline.connect()`. Documented input ordering behavior of auto-promoted lazy variadic sockets in `Pipeline.connect()`. Source: llm_adapter@2026-05-21 Confidence: low	—
Refactor	Low	LLM now supports template-variable mode and pass-through mode for prompt handling. LLM now supports template-variable mode and pass-through mode for prompt handling. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—

Full changelog

⭐️ Highlights

🔍 Combine Retrievers with `MultiRetriever` and `TextEmbeddingRetriever`

Two new retriever components make it easier to build hybrid search pipelines. MultiRetriever runs multiple text retrievers in parallel and merges their results into a single deduplicated list, ranked by reciprocal rank fusion by default. You can selectively enable or disable individual retrievers at runtime using the active_retrievers parameter. This is useful when you want to skip the embedding retriever for short or keyword-only queries, for example.

TextEmbeddingRetriever wraps an embedding-based retriever together with a text embedder into a single component, making it compatible with MultiRetriever by implementing the TextRetriever protocol. Here's how to combine BM25 and embedding retrieval in a single component:

from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.components.embedders import SentenceTransformersTextEmbedder

retriever = MultiRetriever(
    retrievers={
        "bm25": InMemoryBM25Retriever(document_store=doc_store),
        "embedding": TextEmbeddingRetriever(
            retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
            text_embedder=SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
        ),
    },
    top_k=3,
)

# Run all retrievers
result = retriever.run(query="green energy sources")

# Run only the BM25 retriever
result = retriever.run(query="green energy sources", active_retrievers=["bm25"])

⬆️ Upgrade Notes

LLM.run and LLM.run_async no longer accept messages and streaming_callback as positional arguments — they must now be passed as keyword arguments. Update any direct calls accordingly:
```
# Before
llm.run([message], my_callback)

# After
llm.run(messages=[message], streaming_callback=my_callback)
```

🚀 New Features

Add run_async to CacheChecker, enabling it to be used in AsyncPipeline without blocking the event loop.

⚡️ Enhancement Notes

Document the input ordering behavior of auto-promoted lazy variadic sockets in Pipeline.connect(). When multiple senders are connected to the same list-typed receiver socket, ordering depends on the pipeline class. With Pipeline, items are ordered alphabetically by sender component name (because Pipeline.run() schedules components in alphabetical order for deterministic execution), not by the order of connect() calls. With AsyncPipeline, no ordering is guaranteed, since components in different branches may run in parallel. The docstrings now point users to a dedicated joiner component when they need explicit ordering.
Add join_mode parameter to the experimental MultiRetriever component, supporting "reciprocal_rank_fusion" (default) and "concatenate". Reciprocal Rank Fusion merges the ranked result lists from all retrievers into a single deduplicated list ordered by RRF score. The underlying RRF logic is extracted into a shared utility _reciprocal_rank_fusion in haystack.utils.misc, which is now also used by DocumentJoiner.
LLM now supports two usage modes:
1. Template-variable mode: provide a user_prompt with Jinja2 variables (e.g. {{ query }}).
  Those variables become pipeline inputs and messages is optional. The rendered user_prompt
  is always appended after any messages provided at runtime.
2. Pass-through mode: omit user_prompt or provide one with no template variables. messages
  becomes a required input, allowing a fully-constructed list of ChatMessages to be passed from upstream.

🐛 Bug Fixes

Fixed a bug in NamedEntityExtractor where the spaCy/Thinc device state was not correctly restored after execution, potentially affecting the device configuration of other spaCy components in the same process.
Preserve resumable snapshots when some inputs or outputs are non-serializable. Haystack now omits only the failing top-level fields (for example non-serializable callbacks or runtime objects) instead of replacing the whole payload with an empty dictionary. This applies both to agent sub-component inputs (chat_generator and tool_invoker) and to pipeline-level inputs, original_input_data, and pipeline_outputs captured by _create_pipeline_snapshot. When every field fails to serialize, the snapshot still stores a structurally valid empty payload ({"serialization_schema": {"type": "object", "properties": {}}, "serialized_data": {}}) so that resuming the snapshot does not raise DeserializationError — for example when resuming from a ToolBreakpoint where the sub-component's inputs are not strictly required.
Fixed tools_strict=True in OpenAIChatGenerator to recursively apply additionalProperties: false and required to all nested objects in tool parameter schemas. Previously only the top-level object was transformed, causing OpenAI's strict mode to reject tools with nested parameters.

💙 Big thank you to everyone who contributed to this release!

@Aftabbs, @albertodiazdurana, @anakin87, @ArkaD171717, @bilgeyucel, @bogdankostic, @davidsbatista, @FuturMix, @julian-risch, @kacperlukawski, @ritikraj2425, @saivedant169, @shaun0927, @sjrl, @SyedShahmeerAli12

Breaking Changes

`LLM.run` and `LLM.run_async` no longer accept `messages` and `streaming_callback` as positional arguments — they must be passed as keyword arguments.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track haystack

Get notified when new releases ship.

About haystack

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

All releases →

Related context

Related tools

Earlier breaking changes

v2.30.0 LLM.run and LLM.run_async no longer accept positional messages or streaming_callback arguments.