Release history
langroid releases
Harness LLMs with Multi-Agent Programming
All releases
27 shown
Path‑traversal + SQL file read fixes
- Adds Python 3.13 support; CI now tests against Python 3.11 and 3.13.
Full changelog
Python 3.13 support and Task.init() memory leak fix
Python 3.13 support
PR [#1011](https://github.com/langroid/langroid/pull/1011) bumps the supported Python range to >=3.10,<3.14 and adds Python 3.13 to the CI matrix alongside 3.11, so Langroid is now tested on both versions on every push.
pyproject.toml:requires-python = "<3.14,>=3.10".github/workflows/validate.yml: lint/test job now runs against["3.11", "3.13"]
Fix ObjectRegistry memory leak in Task.init()
PR [#1022](https://github.com/langroid/langroid/pull/1022) (originally [#1021](https://github.com/langroid/langroid/pull/1021)) fixes a memory leak in Task.init(). The method built a temporary ChatDocument solely to feed log_message(Entity.SYSTEM, ...), but ChatDocument.__init__ auto-registers every instance in the class-level ObjectRegistry, so each new Task left behind a system-message doc that was never freed. In long-running services that spawn many tasks (e.g. servers), this accumulated significantly over time.
The fix:
- explicitly removes the temporary doc via
ChatDocument.delete_id(...)after logging - renames the local to
system_message_temp_docto make its transient nature obvious - adds a regression test (
test_task_init_no_registry_leak) mirroring the PR #939 leak-check pattern, asserting that repeatedTask(...).init()calls leave zero leakedChatDocuments in the registry
Thanks to @alexagr for the catch and the original fix.
Fixes MiniMax context window size to 204,800 tokens (API limit), preventing unnecessary prompt truncation.
- MiniMax LLM provider with 7 models and up to 1M context
- OpenAI-compatible API with minimax/ prefix
Fixes Gemini model name suffix handling for experimental, latest, and preview variants.
Fixed context-length preflight checks for Gemini model aliases and file attachments.
- Thread-safe cache operations protected by RLock in multi-threaded environments
- LRU tracking with monotonic timestamps to identify and manage stale entries
- prune_cache(max_age_seconds) function to evict stale client entries
Full changelog
Thread-safe client cache with LRU eviction
PR #993 enhances the client cache in langroid/language_models/client_cache.py with thread-safety and LRU (Least Recently Used) eviction.
What's New
- Thread safety: All cache operations are now protected by a
threading.RLock()to prevent race conditions in multi-threaded environments - LRU tracking: Cache entries store a last-used monotonic timestamp, refreshed on each access
prune_cache(max_age_seconds): New function to evict stale client entries older than the specified age- Helper functions:
_get_cached_client()and_store_client()encapsulate cache access with consistent locking and timestamp management
Applies to all cached client getters: get_openai_client(), get_async_openai_client(), get_groq_client(), get_async_groq_client(), get_cerebras_client(), and get_async_cerebras_client().
- Seltz web search provider integration
- SeltzSearchTool for agent usage
- Setup and integration documentation
- Async HTTP client factory support
- Tuple pattern for paired sync and async clients
Fix empty tool arguments serialization (`None` → `"{}"`) that caused `INVALID_ARGUMENT` errors with Gemini models on VertexAI (#988)
Handle null deltas in OpenAIGPT streaming: https://github.com/langroid/langroid/pull/987
Add model information for gpt-5.1-chat and gpt-5.2-chat models.
- Configurable context overflow strategy with 'truncate' (default) and 'drop_turns' options
- Cleaner OpenAI API error logging without full tracebacks for server-side errors
Full changelog
Configurable context overflow strategy (#967, #974)
Added a context_overflow_strategy option to ChatAgentConfig for handling message history that exceeds the model's context length. Two strategies are available:
"truncate"(default): Truncates content of early messages while preserving all messages in the sequence. Maintains backward compatibility and the alternating message structure required by LLM APIs."drop_turns": Drops complete conversation turns (a USER message and all responses until the next USER message). More aggressive but cleaner — particularly useful for voice agents with limited context models (e.g.,llama-3-8bwith 8192 tokens), where individual messages are already short and truncation is ineffective.
config = lr.ChatAgentConfig(
context_overflow_strategy="drop_turns", # or "truncate" (default)
...
)
Cleaner OpenAI API error logging (#975)
OpenAI API errors (authentication, bad request, rate limits, etc.) are now logged without a full Python traceback, since these errors originate server-side and a local stack trace adds no diagnostic value. Network-level errors (APIConnectionError, APITimeoutError) still include the full traceback to aid in diagnosing local issues.
Minor fixes
- Fixed defensive check for empty/missing
choicesin OpenAI API response parsing (#975) - Formatting cleanup in inline reasoning tests (#977)
- Updated
llms*.txtdocumentation files
- Inline reasoning support in streaming responses for models using embedded thinking/reasoning delimiters
- Added `extra_content` field to `OpenAIToolCall` for storing additional metadata from tool calls
- New `_split_inline_reasoning()` method with configurable delimiters and state tracking across streaming chunks
Full changelog
Support inline reasoning in streaming responses (#973)
Added support for handling inline reasoning content in streaming LLM responses, for models that embed thinking/reasoning inside the main content stream (e.g., using <think>...</think> delimiters) rather than in a separate reasoning field.
Key Changes
- Added
extra_contentfield toOpenAIToolCall: New optional field to store additional metadata from tool calls, with updatedfrom_dict()andapi_dict()methods - Implemented inline reasoning splitting: New
_split_inline_reasoning()static method that separates reasoning tokens from text tokens based on configurable delimiters, tracking state across streaming chunks - Enhanced streaming event processing: Updated
_process_stream_event()and_process_stream_event_async()to route split tokens to correct streamers (TEXT vs REASONING) with proper state tracking - Updated Ruff: Bumped pre-commit ruff version from v0.14.14 to v0.15.0
- CVE-2025-46724: RCE bypass prevented via dunder attribute blocking in AST validator
- Reasoning parameter in `show_llm_response` and `finish_llm_stream` callbacks to expose LLM chain-of-thought
- Automatic reasoning display in Chainlit UIs with '💭 Reasoning' label
Full changelog
Add reasoning parameter to LLM response callbacks (#965)
This release adds support for passing chain-of-thought reasoning from LLMs (like DeepSeek R1, Claude extended thinking) to UI callbacks.
Features
-
Reasoning in callbacks:
show_llm_responseandfinish_llm_streamcallbacks now receive areasoningparameter containing the LLM's chain-of-thought reasoning (when available) -
Chainlit integration: Reasoning is automatically displayed as a nested message with a "💭 Reasoning" label in Chainlit UIs
-
Backward compatible: Existing custom callbacks without the
reasoningparameter continue to work - uses signature inspection to only pass reasoning if supported
Documentation
- Added "Displaying Reasoning in UI Callbacks" section to reasoning-content.md with examples for custom callback implementations
Thanks to @alexagr for the initial PR adding the reasoning parameter!
- Message routing configuration documented
- OpenAIAssistant routing behavior fix
- Text-based routing test coverage
- Preserve thought tags in message history for inline reasoning models
- Added message_with_reasoning and content_with_reasoning fields
- Vertex AI support for Gemini models
- New GEMINI_API_BASE environment variable
User-provided API parameters like reasoning_effort are no longer silently filtered.
Full changelog
Fixed
- Respect user-provided API parameters: Removed overly aggressive parameter filtering that silently dropped params like
reasoning_effortfor models added toMODEL_INFO. The library now trusts user configuration and lets the API validate parameter support. (#956 - thanks @alexagr)
- Callbacks `show_llm_response` and `finish_llm_stream` now have separate `content` (text messages) and `tools_content` (serialized tool calls) parameters
Full changelog
What's Changed
Improvements
Callback API Enhancement: Separate content and tools_content (#952, #945)
The show_llm_response and finish_llm_stream callbacks now include separate content and tools_content parameters:
content: Always contains the text message generated by the modeltools_content: Contains serialized functions/tools if present, empty string otherwise
Why this matters: Previously, content mixed text messages with JSON-serialized tool calls. This caused issues for applications using callbacks for purposes like text-to-speech, where tool call JSON was incorrectly processed as regular text.
Backward compatible: The tools_content parameter defaults to an empty string, so existing callback implementations continue to work.
Full Changelog: https://github.com/langroid/langroid/compare/0.59.25...0.59.26
- GPT-5.2 and GPT-5.2-Pro support
- Gemini 3 Flash and Gemini 3 Pro support
- Claude 4 family including 4.5 variants
- Azure OpenAI API v1 support with chat_model_orig parameter
- Fixed tool message caching in multi-agent scenarios
- Fixed ObjectRegistry memory leak
Minor fixes and improvements.
Full changelog
What's Changed
- Fix: Replace deprecated
datetime.utcnow()withdatetime.now(timezone.utc)to remove DeprecationWarnings in Python 3.12+ (#937)