This release includes 1 breaking change for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+1 more
Summary
AI summaryLangGraph / LangChain integration adds async coalescing chat model and explicit fork primitive with dtype matching requirement.
Full changelog
Highlights
LangGraph / LangChain integration. Install with pip install thaw-vllm[langgraph] and get two entry points:
ChatThaw(BaseChatModel)— a LangChain drop-in chat model backed by a vLLM parent +ForkPool. Single and concurrentainvokecalls land in an async coalescer and run through a single batchedvLLM.generate(continuous batching).fork_fanout(llm, prefix, suffix_lists)— the explicit fork primitive. Given shared prefix messages and a list of divergent suffix message lists, snapshots the parent's cached KV once and fans out to the pool. Sub-second amortized cost after pool warm-up.
from thaw_vllm.langgraph import ChatThaw, fork_fanout
llm = ChatThaw(model="meta-llama/Llama-3.1-8B-Instruct", workers=2)
texts = await fork_fanout(llm, prefix_messages, [suffix_a, suffix_b, suffix_c, suffix_d])
Validation (H100 + Llama-3.1-8B, 2026-04-21)
3 rounds × 4 branches, LangGraph StateGraph with fork_fanout inside a single reviewer node. Wall time: 64.55s (round 0, pool init) → 1.43s → 1.43s. All branches produce coherent, perspective-specific reviews.
Receipt: site/receipts/2026-04-21_h100_pr_review_langgraph.json
Reproducer: python demos/pr_review_langgraph.py --mode thaw --dtype float16
Breaking / important
- dtype must match between parent and pool workers.
demos/pr_review_langgraph.pynow takes--dtype(defaultfloat16) and applies it to both. If you build a customChatThawor callForkPooldirectly, pass the same dtype toextra_llm_kwargsandextra_pool_kwargs— a mismatch corrupts snapshotted KV cache blocks and produces garbage on rounds 1+.
Changelog
thaw_vllm.langgraph.ChatThaw— LangChainBaseChatModel(async coalescer, batched singles path)thaw_vllm.langgraph.fork_fanout— explicit fork primitivethaw_vllm.langgraph.ForkCoalescer— framework-agnostic coalescer (can be reused outside LangChain)demos/pr_review_langgraph.py— mode-aware reference demo (--mode thawvs--mode baseline)ChatThaw.enable_auto_fork=Falseby default — auto-route fork path is opt-in behind the flag--dtypedemo flag, receipts, test coverage (coalescer + ChatThaw + demo integration)
Co-Authored-By: Claude Opus 4.7 [email protected]
Breaking Changes
- dtype must match between parent model and pool workers; mismatch corrupts KV cache
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Thaw
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]