haystack

v2.30.0 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 1mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent agents ai gemini generative-ai gpt-4

+13 more

information-retrieval large-language-models llm machine-learning nlp orchestration python pytorch question-answering retrieval-augmented-generation semantic-search summarization transformers

Affected surfaces

breaking_upgrade

ReleasePort's take

Light signal

editorial:auto 1mo

LLM.run and LLM.run_async now require keyword arguments; positional messages or streaming_callback are no longer accepted.

Why it matters: This breaking change (severity 70) forces callers to update all invocations of LLM.run/LLM.run_async before upgrading, preventing runtime errors on version v2.30.0.

Summary

AI summary

Updates 🐛 Bug Fixes, 🚀 New Features, and ⚡️ Enhancement Notes across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Breaking	High	LLM.run and LLM.run_async no longer accept positional messages or streaming_callback arguments. LLM.run and LLM.run_async no longer accept positional messages or streaming_callback arguments. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature
Feature	Medium	All Haystack ChatGenerator components now accept a plain string for the messages parameter. All Haystack ChatGenerator components now accept a plain string for the messages parameter. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature	Medium	Introduced the `PythonCodeSplitter` component for syntax‑aware Python source splitting. Introduced the `PythonCodeSplitter` component for syntax‑aware Python source splitting. Source: llm_adapter@2026-06-03 Confidence: high	—
Feature	Low	`TextEmbeddingRetriever`, `MultiQueryEmbeddingRetriever`, and `MultiQueryTextRetriever` now support async execution via run_async. `TextEmbeddingRetriever`, `MultiQueryEmbeddingRetriever`, and `MultiQueryTextRetriever` now support async execution via run_async. Source: llm_adapter@2026-06-03 Confidence: low	—
Feature	Low	TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever gained run_async support for native coroutine use in AsyncPipeline. TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever gained run_async support for native coroutine use in AsyncPipeline. Source: granite4.1:30b@2026-06-03-audit Confidence: low	—
Feature	Low	LLM now supports template‑variable mode (Jinja2 variables) and pass‑through mode for messages, enabling flexible prompt construction. LLM now supports template‑variable mode (Jinja2 variables) and pass‑through mode for messages, enabling flexible prompt construction. Source: granite4.1:30b@2026-06-03-audit Confidence: low	—
Feature	Low	Pipeline.draw() and Pipeline.show() validate Mermaid server responses (magic‑byte signature and Content-Type) before writing files, raising PipelineDrawingError on mismatches. Pipeline.draw() and Pipeline.show() validate Mermaid server responses (magic‑byte signature and Content-Type) before writing files, raising PipelineDrawingError on mismatches. Source: granite4.1:30b@2026-06-03-audit Confidence: low	—
Bugfix
Bugfix	Medium	Agent now exits correctly when multiple tool calls include the configured exit‑condition tool regardless of order. Agent now exits correctly when multiple tool calls include the configured exit‑condition tool regardless of order. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Medium	`LLMMetadataExtractor.run_async` now respects max_workers concurrency limit. `LLMMetadataExtractor.run_async` now respects max_workers concurrency limit. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Low	`Document.from_dict()` no longer mutates the input dictionary during deserialization. `Document.from_dict()` no longer mutates the input dictionary during deserialization. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Low	`AnswerBuilder.run()` no longer mutates the meta dict of input Document objects. `AnswerBuilder.run()` no longer mutates the meta dict of input Document objects. Source: llm_adapter@2026-06-03 Confidence: high	—
Bugfix	Low	`DocumentLanguageClassifier` no longer crashes when `Document.content` is None. `DocumentLanguageClassifier` no longer crashes when `Document.content` is None. Source: llm_adapter@2026-06-03 Confidence: low	—
Bugfix	Low	`DocumentJoiner` in concatenate mode now treats documents with a score of exactly 0.0 as scored, not unscored. `DocumentJoiner` in concatenate mode now treats documents with a score of exactly 0.0 as scored, not unscored. Source: llm_adapter@2026-06-03 Confidence: low	—
Bugfix	Low	DocumentLanguageClassifier now handles Document.content=None gracefully without crashing, logging a warning instead. DocumentLanguageClassifier now handles Document.content=None gracefully without crashing, logging a warning instead. Source: granite4.1:30b@2026-06-03-audit Confidence: low	—
Bugfix	Low	DocumentJoiner in concatenate mode treats a score of exactly 0.0 as valid, preventing unintended duplicate removal. DocumentJoiner in concatenate mode treats a score of exactly 0.0 as valid, preventing unintended duplicate removal. Source: granite4.1:30b@2026-06-03-audit Confidence: low	—
Refactor	Low	ToolsType type hint updated to accept any class inheriting from Tool or Toolset in sequences like list or tuple. ToolsType type hint updated to accept any class inheriting from Tool or Toolset in sequences like list or tuple. Source: granite4.1:30b@2026-06-03-audit Confidence: low	—

Full changelog

⭐️ Highlights

🐍 Syntax-aware Python code splitting with `PythonCodeSplitter`

The new PythonCodeSplitter is a syntax-aware splitter for Python source files, built for code-RAG and code-search pipelines where naive line-based splitting tends to cut through functions and lose structural context. It parses sources with the ast module and greedily merges units, such as module docstring, import blocks, top-level functions, class headers, methods, and nested classes, into chunks of roughly max_effective_lines, keeping whole functions and methods together. For functions that exceed oversized_factor * max_effective_lines, it falls back to a line-based secondary split with overlap.

Two options make the resulting chunks more useful downstream: strip_docstrings=True moves docstrings into chunk metadata, and preserve_class_definition=True prepends the enclosing class signature to chunks whose members live in a later chunk. Each chunk also carries rich metadata including start_line, end_line, unit_kinds, include_classes, decorators, docstrings, source_id, and split_id.

from haystack.components.preprocessors import PythonCodeSplitter

splitter = PythonCodeSplitter(
    max_effective_lines=80,
    strip_docstrings=True,
    preserve_class_definition=True,
)
result = splitter.run(documents=[doc])

💬 Pass a plain string to any `ChatGenerator`

All Haystack ChatGenerator components now accept a plain string for the messages parameter in addition to a list of ChatMessage objects. The string is automatically wrapped in a ChatMessage with the user role. This makes switching from a Generator to a ChatGenerator a one-line change. The change applies to AzureOpenAIChatGenerator, AzureOpenAIResponsesChatGenerator, FallbackChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, OpenAIChatGenerator, and OpenAIResponsesChatGenerator, and will soon be rolled out to the ChatGenerators in Haystack Core Integrations.

from haystack.components.generators.chat import OpenAIChatGenerator

generator = OpenAIChatGenerator()

# passing a string is equivalent to passing [ChatMessage.from_user("...")]
response = generator.run("What's Natural Language Processing?")
print(response["replies"][0].text)

⬆️ Upgrade Notes

DALLEImageGenerator has been updated to account for OpenAI's retirement of the DALL-E models. The default model is now gpt-image-2 (previously dall-e-3). To migrate:
- Update model value: besides gpt-image-2, gpt-image-1 and gpt-image-1-mini are also supported.
- Update quality value: the new accepted values are auto, high, medium, or low (previously standard or hd).
- Update size value: the new accepted values are 1024x1024, 1024x1536, 1536x1024, or auto. gpt-image-2 also supports arbitrary sizes.
- The response_format parameter is now ignored. The component always returns base64-encoded JSON.
LLM.run and LLM.run_async no longer accept messages and streaming_callback as positional arguments — they must now be passed as keyword arguments. Update any direct calls accordingly:
```
# Before
llm.run([message], my_callback)

# After
llm.run(messages=[message], streaming_callback=my_callback)
```

🚀 New Features

Introduced the PythonCodeSplitter component, a syntax-aware splitter for Python source files:
- Parses sources with the ast module and merges units (module docstring, import blocks, top-level functions, class headers, methods, nested classes, and remaining statements) greedily into chunks of roughly max_effective_lines.
- Keeps whole functions and methods together; falls back to a line-based secondary split (using DocumentSplitter) with overlap only for functions whose effective length exceeds oversized_factor * max_effective_lines.
- Optionally strips docstrings into chunk metadata via strip_docstrings=True, and prepends the enclosing class signature to chunks whose members live in a later chunk via preserve_class_definition=True.
- Emits per-chunk metadata including start_line, end_line, unit_kinds, include_classes, decorators, docstrings, source_id, and split_id.
All Haystack ChatGenerator components now also accept a plain string for the messages parameter in addition to a list of ChatMessage objects. The string is automatically converted into a list containing a ChatMessage with the user role. This is done to simplify switching from Generators to ChatGenerators; Generators might be removed in Haystack 3.0.

This applies to AzureOpenAIChatGenerator, AzureOpenAIResponsesChatGenerator, FallbackChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, OpenAIChatGenerator, and OpenAIResponsesChatGenerator.

The same change will be soon applied to ChatGenerators available in Haystack Core Integrations.

Example:
```
from haystack.components.generators.chat import OpenAIChatGenerator

generator = OpenAIChatGenerator()

# passing a string is equivalent to passing [ChatMessage.from_user("...")]
response = generator.run("What's Natural Language Processing?")
print(response["replies"][0].text)
```

⚡️ Enhancement Notes

Added run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever. These components now execute natively as coroutines in AsyncPipeline, delegating to each wrapped component's run_async when available and falling back to a thread executor otherwise.
Fix grammar in the AzureOpenAIGenerator and AzureOpenAIChatGenerator docstring code examples ("<this a model name..." → "<this is a model name...") so that copy-pasted snippets read correctly.
LLM now supports two usage modes:
1. Template-variable mode: provide a user_prompt with Jinja2 variables (e.g. {{ query }}). Those variables become pipeline inputs and messages is optional. The rendered user_prompt is always appended after any messages provided at runtime.
2. Pass-through mode: omit user_prompt or provide one with no template variables. messages becomes a required input, allowing a fully-constructed list of ChatMessages to be passed from upstream.
Update ToolsType to improve type checking for the tools parameter. Any class that inherits from either Tool or Toolset is now accepted in any sequence (list, tuple, etc).
Pipeline.draw() and Pipeline.show() now validate the Mermaid server response before writing it to disk. The response body is checked against the expected output format (PNG, JPEG, WebP, SVG, or PDF) via its magic-byte signature, and the Content-Type header is checked as well. If the response is empty or does not match the requested format, a PipelineDrawingError is raised and no file is written. This prevents a misconfigured or untrusted server_url from causing arbitrary content (for example an HTML error page) to be saved verbatim to the output path.

🐛 Bug Fixes

Prevent Document.from_dict() from mutating the input dictionary during deserialization.
Prevent DocumentLanguageClassifier from crashing when Document.content=None by marking them as unmatched and logging a warning.
Fixed a bug where Agent would not exit when the model emitted multiple tool calls in a single turn and the configured exit-condition tool was not the first one in the list. Previously, only the first tool call in each assistant message was checked against exit_conditions, so a reply like [search, finish] (with exit_conditions=["finish"]) would silently fail to stop the loop and keep iterating until max_agent_steps was reached. Since parallel tool calls are now the norm for frontier models, this could quietly turn a single successful turn into dozens of wasted LLM calls. The Agent now inspects every tool call in the message, so the exit condition is honored regardless of ordering.
Fix AnswerBuilder.run() mutating the meta dict of input Document objects. source_index (and referenced when reference_pattern is set) are now only added to the document copies inside GeneratedAnswer.documents, not to the originals.
Fixed DocumentJoiner in concatenate mode so that documents with a score of exactly 0.0 are no longer treated as unscored during deduplication. Previously a truthiness check coerced score=0.0 to -inf, which could cause a worse, negatively-scored duplicate to be kept instead of the 0.0-scored document. The merge mode was updated to the same explicit is not None check for consistency; its observable behavior is unchanged.
Fixed in-place mutation of ExtractedAnswer.meta in ExtractiveReader._add_answer_page_number when the answer's meta was None. Now uses dataclasses.replace to avoid triggering the dataclass mutation warning.
Fixed ExtractiveReader raising ValueError when the number of valid answer spans for a sequence was smaller than answers_per_seq (for example with short documents or when answers_per_seq exceeded the number of upper-triangular, non-masked (start, end) token pairs). _postprocess now filters the per-sequence probabilities by the same validity mask it already applied to the start/end token indices, so the three structures always have matching lengths.
HierarchicalDocumentSplitter no longer mutates the metadata of the input Document. _add_meta_data now returns a new Document with a copied meta dict via dataclasses.replace instead of writing __block_size, __parent_id, __children_ids and __level onto the caller's Document.
Fixed a bug in LLMMetadataExtractor.run_async where the asyncio.Semaphore intended to bound concurrent LLM calls to max_workers was acquired once around the outer gather(...) call instead of inside each task. As a result, max_workers had no effect in run_async and all LLM requests for a batch were issued simultaneously. The semaphore is now acquired per task, so max_workers correctly caps in-flight requests.
expand_page_range() now raises a ValueError: too many values to unpack when a page range string contained more than one hyphen (e.g. "10-20-30"). The parser now validates the format and raises a clear ValueError with an explanatory message for invalid inputs.
LLMMetadataExtractor now raises a clear ValueError when the prompt contains no template variables. Previously this case raised an unhelpful IndexError: list index out of range. The error message now consistently explains that the prompt must contain exactly one variable called document.
Fixed HuggingFaceAPIDocumentEmbedder serialization to preserve the configured concurrency_limit.
Fix TransformersZeroShotDocumentClassifier.to_dict to include the classification_field and multi_label init parameters, which were previously dropped on serialization and reset to defaults after from_dict.

💙 Big thank you to everyone who contributed to this release!

@Aarkin7, @anakin87, @bogdankostic, @davidsbatista, @etairl, @HamidOna, @julian-risch, @kota-wilson, @maxdswain, @MechaCritter, @pragnyanramtha, @rautaditya2606, @sachinn854, @sjrl

Breaking Changes

`LLM.run` and `LLM.run_async` no longer accept `messages` and `streaming_callback` as positional arguments; they must be passed as keyword arguments.
`DALLEImageGenerator` default model changed from `dall-e-3` to `gpt-image-2`; quality options updated (auto, high, medium, low) and size options updated (1024x1024, 1024x1536, 1536x1024, auto); `response_format` parameter is ignored.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track haystack

Get notified when new releases ship.

About haystack

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

All releases →

Related context

Related tools

Earlier breaking changes

v2.29.0 `LLM.run` and `LLM.run_async` now require keyword arguments for `messages` and `streaming_callback`.