Skip to content

Thaw

v0.4.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents inference kv-cache llm reinforcement-learning sglang
+1 more
vllm

Summary

AI summary

LangGraph / LangChain integration adds async coalescing chat model and explicit fork primitive with dtype matching requirement.

Full changelog

Highlights

LangGraph / LangChain integration. Install with pip install thaw-vllm[langgraph] and get two entry points:

  • ChatThaw(BaseChatModel) — a LangChain drop-in chat model backed by a vLLM parent + ForkPool. Single and concurrent ainvoke calls land in an async coalescer and run through a single batched vLLM.generate (continuous batching).
  • fork_fanout(llm, prefix, suffix_lists) — the explicit fork primitive. Given shared prefix messages and a list of divergent suffix message lists, snapshots the parent's cached KV once and fans out to the pool. Sub-second amortized cost after pool warm-up.
from thaw_vllm.langgraph import ChatThaw, fork_fanout

llm = ChatThaw(model="meta-llama/Llama-3.1-8B-Instruct", workers=2)
texts = await fork_fanout(llm, prefix_messages, [suffix_a, suffix_b, suffix_c, suffix_d])

Validation (H100 + Llama-3.1-8B, 2026-04-21)

3 rounds × 4 branches, LangGraph StateGraph with fork_fanout inside a single reviewer node. Wall time: 64.55s (round 0, pool init) → 1.43s → 1.43s. All branches produce coherent, perspective-specific reviews.

Receipt: site/receipts/2026-04-21_h100_pr_review_langgraph.json
Reproducer: python demos/pr_review_langgraph.py --mode thaw --dtype float16

Breaking / important

  • dtype must match between parent and pool workers. demos/pr_review_langgraph.py now takes --dtype (default float16) and applies it to both. If you build a custom ChatThaw or call ForkPool directly, pass the same dtype to extra_llm_kwargs and extra_pool_kwargs — a mismatch corrupts snapshotted KV cache blocks and produces garbage on rounds 1+.

Changelog

  • thaw_vllm.langgraph.ChatThaw — LangChain BaseChatModel (async coalescer, batched singles path)
  • thaw_vllm.langgraph.fork_fanout — explicit fork primitive
  • thaw_vllm.langgraph.ForkCoalescer — framework-agnostic coalescer (can be reused outside LangChain)
  • demos/pr_review_langgraph.py — mode-aware reference demo (--mode thaw vs --mode baseline)
  • ChatThaw.enable_auto_fork=False by default — auto-route fork path is opt-in behind the flag
  • --dtype demo flag, receipts, test coverage (coalescer + ChatThaw + demo integration)

Co-Authored-By: Claude Opus 4.7 [email protected]

Breaking Changes

  • dtype must match between parent model and pool workers; mismatch corrupts KV cache

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Thaw

Get notified when new releases ship.

Sign up free

Related context

Beta — feedback welcome: [email protected]