This release adds 2 notable features for engineering teams evaluating rollout.
Published 1mo
AI Agents & Assistants
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
agents
inference
kv-cache
llm
reinforcement-learning
sglang
+1 more
vllm
Summary
AI summaryUpdates What's new, Receipts — 2×H100 SXM, and level across a mixed release.
Full changelog
thaw-vllm 0.5.0
Sleep-mode integration for vLLM RFC #34303.
What's new
thaw_vllm.sleep_mode— new module composing thaw'sfreeze_model_tp/restore_model_tpwith vLLM's nativeLLM.sleep(level=2)/LLM.wake_up():sleep(llm, path, *, level=2, strict=True)— freezes weights, then callsllm.sleep(level)soCuMemAllocatoractually frees GPU memory.wake_up(llm, path)— callsllm.wake_up()to re-allocate GPU tensors, thenrestore_model_tppopulates them.strict=Falseruns freeze-only for durable-checkpoint use case (GPU stays allocated).
demos/sleep_mode_demo.py— produces a JSON bit-identity receipt with sleep/wake wall-clocks, freeze/restore throughput, and CuMemAllocator memory deltas.tests/test_sleep_mode.py— 8 CPU-only unit tests.- TP>1 restore cascade flip —
_worker_restorenow triesrestore_model_from_ramfirst (shared page cache → parallel reads per rank), falls back torestore_model_pipelined. Removes the dual O_DIRECT pread contention that regressed 2×H100 restore throughput.
Receipts — 2×H100 SXM
- Llama-3.1-8B TP=1 (receipt): sleep 3.4s / wake 11.1s, bit-identical, CuMemAllocator freed 45.38 GiB.
- Llama-3.1-70B TP=2 (receipt): sleep 16.1s (9.04 GB/s aggregate) / wake 53.6s (2.78 GB/s aggregate), 141 GB snapshot across 966 regions, bit-identical, CuMemAllocator freed 72.67 GiB per rank = 145 GiB total. vLLM's own
wake_upre-alloc only 0.33s — thaw's restore is the wall-clock.
Install
pip install thaw-vllm==0.5.0 thaw-native>=0.3.1
Requires enable_sleep_mode=True on the LLM() constructor. See demos/sleep_mode_demo.py.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Thaw
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]