Thaw

v0.4.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 3mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents inference kv-cache llm reinforcement-learning sglang

+1 more

vllm

Summary

AI summary

LangGraph / LangChain integration adds async coalescing chat model and explicit fork primitive with dtype matching requirement.

Full changelog

Highlights

LangGraph / LangChain integration. Install with pip install thaw-vllm[langgraph] and get two entry points:

ChatThaw(BaseChatModel) — a LangChain drop-in chat model backed by a vLLM parent + ForkPool. Single and concurrent ainvoke calls land in an async coalescer and run through a single batched vLLM.generate (continuous batching).
fork_fanout(llm, prefix, suffix_lists) — the explicit fork primitive. Given shared prefix messages and a list of divergent suffix message lists, snapshots the parent's cached KV once and fans out to the pool. Sub-second amortized cost after pool warm-up.

from thaw_vllm.langgraph import ChatThaw, fork_fanout

llm = ChatThaw(model="meta-llama/Llama-3.1-8B-Instruct", workers=2)
texts = await fork_fanout(llm, prefix_messages, [suffix_a, suffix_b, suffix_c, suffix_d])

Validation (H100 + Llama-3.1-8B, 2026-04-21)

3 rounds × 4 branches, LangGraph StateGraph with fork_fanout inside a single reviewer node. Wall time: 64.55s (round 0, pool init) → 1.43s → 1.43s. All branches produce coherent, perspective-specific reviews.

Receipt: site/receipts/2026-04-21_h100_pr_review_langgraph.json
Reproducer: python demos/pr_review_langgraph.py --mode thaw --dtype float16

Breaking / important

dtype must match between parent and pool workers. demos/pr_review_langgraph.py now takes --dtype (default float16) and applies it to both. If you build a custom ChatThaw or call ForkPool directly, pass the same dtype to extra_llm_kwargs and extra_pool_kwargs — a mismatch corrupts snapshotted KV cache blocks and produces garbage on rounds 1+.

Changelog

thaw_vllm.langgraph.ChatThaw — LangChain BaseChatModel (async coalescer, batched singles path)
thaw_vllm.langgraph.fork_fanout — explicit fork primitive
thaw_vllm.langgraph.ForkCoalescer — framework-agnostic coalescer (can be reused outside LangChain)
demos/pr_review_langgraph.py — mode-aware reference demo (--mode thaw vs --mode baseline)
ChatThaw.enable_auto_fork=False by default — auto-route fork path is opt-in behind the flag
--dtype demo flag, receipts, test coverage (coalescer + ChatThaw + demo integration)

Co-Authored-By: Claude Opus 4.7 [email protected]

Breaking Changes

dtype must match between parent model and pool workers; mismatch corrupts KV cache

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Thaw

Get notified when new releases ship.

About Thaw

All releases →