Skip to content

haystack

v2.29.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 22d LLM Frameworks
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent agents ai gemini generative-ai gpt-4
+13 more
information-retrieval large-language-models llm machine-learning nlp orchestration python pytorch question-answering retrieval-augmented-generation semantic-search summarization transformers

ReleasePort's take

Moderate signal
editorial:auto 13d

Haystack v2.29.0 breaks LLM.run and LLM.run_async, requiring keyword arguments for messages and streaming_callback. The release adds MultiRetriever for parallel retrieval execution and CacheChecker async support.

Why it matters: Apps calling LLM.run/run_async with positional arguments must migrate to keyword syntax before upgrading. No auto-conversion path. Test in dev and plan the upgrade accordingly.

Summary

AI summary

LLM.run and LLM.run_async now require keyword arguments for messages and streaming_callback.

Changes in this release

Breaking Medium

`LLM.run` and `LLM.run_async` now require keyword arguments for `messages` and `streaming_callback`.

`LLM.run` and `LLM.run_async` now require keyword arguments for `messages` and `streaming_callback`.

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

MultiRetriever runs multiple text retrievers in parallel and merges results.

MultiRetriever runs multiple text retrievers in parallel and merges results.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

TextEmbeddingRetriever wraps embedding-based retriever with embedder for MultiRetriever compatibility.

TextEmbeddingRetriever wraps embedding-based retriever with embedder for MultiRetriever compatibility.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

`CacheChecker.run_async` enables use in AsyncPipeline without blocking event loop.

`CacheChecker.run_async` enables use in AsyncPipeline without blocking event loop.

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

`MultiRetriever` now supports `join_mode` parameter with `

`MultiRetriever` now supports `join_mode` parameter with `

Source: llm_adapter@2026-05-21

Confidence: low

Feature Medium

CacheChecker gains a `run_async` method to be used in AsyncPipeline without blocking.

CacheChecker gains a `run_async` method to be used in AsyncPipeline without blocking.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Feature Medium

MultiRetriever adds a `join_mode` parameter supporting "reciprocal_rank_fusion" (default) and "concatenate".

MultiRetriever adds a `join_mode` parameter supporting "reciprocal_rank_fusion" (default) and "concatenate".

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Bugfix Medium

NamedEntityExtractor restores spaCy/Thinc device state correctly after execution.

NamedEntityExtractor restores spaCy/Thinc device state correctly after execution.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Bugfix Medium

OpenAIChatGenerator's `tools_strict=True` now recursively applies schema strictness to nested tool parameters.

OpenAIChatGenerator's `tools_strict=True` now recursively applies schema strictness to nested tool parameters.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Bugfix Low

Preserve resumable snapshots when some inputs or outputs are non-serializable, omitting only failing fields.

Preserve resumable snapshots when some inputs or outputs are non-serializable, omitting only failing fields.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Refactor Medium

Documented input ordering behavior of auto-promoted lazy variadic sockets in `Pipeline.connect()`.

Documented input ordering behavior of auto-promoted lazy variadic sockets in `Pipeline.connect()`.

Source: llm_adapter@2026-05-21

Confidence: low

Refactor Low

LLM now supports template-variable mode and pass-through mode for prompt handling.

LLM now supports template-variable mode and pass-through mode for prompt handling.

Source: granite4.1:30b@2026-05-23-audit

Confidence: low

Full changelog

⭐️ Highlights

🔍 Combine Retrievers with MultiRetriever and TextEmbeddingRetriever

Two new retriever components make it easier to build hybrid search pipelines. MultiRetriever runs multiple text retrievers in parallel and merges their results into a single deduplicated list, ranked by reciprocal rank fusion by default. You can selectively enable or disable individual retrievers at runtime using the active_retrievers parameter. This is useful when you want to skip the embedding retriever for short or keyword-only queries, for example.

TextEmbeddingRetriever wraps an embedding-based retriever together with a text embedder into a single component, making it compatible with MultiRetriever by implementing the TextRetriever protocol. Here's how to combine BM25 and embedding retrieval in a single component:

from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.components.embedders import SentenceTransformersTextEmbedder

retriever = MultiRetriever(
    retrievers={
        "bm25": InMemoryBM25Retriever(document_store=doc_store),
        "embedding": TextEmbeddingRetriever(
            retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
            text_embedder=SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
        ),
    },
    top_k=3,
)

# Run all retrievers
result = retriever.run(query="green energy sources")

# Run only the BM25 retriever
result = retriever.run(query="green energy sources", active_retrievers=["bm25"])

⬆️ Upgrade Notes

  • LLM.run and LLM.run_async no longer accept messages and streaming_callback as positional arguments — they must now be passed as keyword arguments. Update any direct calls accordingly:

    # Before
    llm.run([message], my_callback)
    
    # After
    llm.run(messages=[message], streaming_callback=my_callback)
    

🚀 New Features

  • Add run_async to CacheChecker, enabling it to be used in AsyncPipeline without blocking the event loop.

⚡️ Enhancement Notes

  • Document the input ordering behavior of auto-promoted lazy variadic sockets in Pipeline.connect(). When multiple senders are connected to the same list-typed receiver socket, ordering depends on the pipeline class. With Pipeline, items are ordered alphabetically by sender component name (because Pipeline.run() schedules components in alphabetical order for deterministic execution), not by the order of connect() calls. With AsyncPipeline, no ordering is guaranteed, since components in different branches may run in parallel. The docstrings now point users to a dedicated joiner component when they need explicit ordering.
  • Add join_mode parameter to the experimental MultiRetriever component, supporting "reciprocal_rank_fusion" (default) and "concatenate". Reciprocal Rank Fusion merges the ranked result lists from all retrievers into a single deduplicated list ordered by RRF score. The underlying RRF logic is extracted into a shared utility _reciprocal_rank_fusion in haystack.utils.misc, which is now also used by DocumentJoiner.
  • LLM now supports two usage modes:
    1. Template-variable mode: provide a user_prompt with Jinja2 variables (e.g. {{ query }}).
      Those variables become pipeline inputs and messages is optional. The rendered user_prompt
      is always appended after any messages provided at runtime.
    2. Pass-through mode: omit user_prompt or provide one with no template variables. messages
      becomes a required input, allowing a fully-constructed list of ChatMessages to be passed from upstream.

🐛 Bug Fixes

  • Fixed a bug in NamedEntityExtractor where the spaCy/Thinc device state was not correctly restored after execution, potentially affecting the device configuration of other spaCy components in the same process.
  • Preserve resumable snapshots when some inputs or outputs are non-serializable. Haystack now omits only the failing top-level fields (for example non-serializable callbacks or runtime objects) instead of replacing the whole payload with an empty dictionary. This applies both to agent sub-component inputs (chat_generator and tool_invoker) and to pipeline-level inputs, original_input_data, and pipeline_outputs captured by _create_pipeline_snapshot. When every field fails to serialize, the snapshot still stores a structurally valid empty payload ({"serialization_schema": {"type": "object", "properties": {}}, "serialized_data": {}}) so that resuming the snapshot does not raise DeserializationError — for example when resuming from a ToolBreakpoint where the sub-component's inputs are not strictly required.
  • Fixed tools_strict=True in OpenAIChatGenerator to recursively apply additionalProperties: false and required to all nested objects in tool parameter schemas. Previously only the top-level object was transformed, causing OpenAI's strict mode to reject tools with nested parameters.

💙 Big thank you to everyone who contributed to this release!

@Aftabbs, @albertodiazdurana, @anakin87, @ArkaD171717, @bilgeyucel, @bogdankostic, @davidsbatista, @FuturMix, @julian-risch, @kacperlukawski, @ritikraj2425, @saivedant169, @shaun0927, @sjrl, @SyedShahmeerAli12

Breaking Changes

  • `LLM.run` and `LLM.run_async` no longer accept `messages` and `streaming_callback` as positional arguments — they must be passed as keyword arguments.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track haystack

Get notified when new releases ship.

Sign up free

About haystack

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

All releases →

Related context

Earlier breaking changes

  • v2.30.0 LLM.run and LLM.run_async no longer accept positional messages or streaming_callback arguments.

Beta — feedback welcome: [email protected]