Skip to content

Thaw

v0.5.0 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents inference kv-cache llm reinforcement-learning sglang
+1 more
vllm

Summary

AI summary

Updates What's new, Receipts — 2×H100 SXM, and level across a mixed release.

Full changelog

thaw-vllm 0.5.0

Sleep-mode integration for vLLM RFC #34303.

What's new

  • thaw_vllm.sleep_mode — new module composing thaw's freeze_model_tp / restore_model_tp with vLLM's native LLM.sleep(level=2) / LLM.wake_up():
    • sleep(llm, path, *, level=2, strict=True) — freezes weights, then calls llm.sleep(level) so CuMemAllocator actually frees GPU memory.
    • wake_up(llm, path) — calls llm.wake_up() to re-allocate GPU tensors, then restore_model_tp populates them.
    • strict=False runs freeze-only for durable-checkpoint use case (GPU stays allocated).
  • demos/sleep_mode_demo.py — produces a JSON bit-identity receipt with sleep/wake wall-clocks, freeze/restore throughput, and CuMemAllocator memory deltas.
  • tests/test_sleep_mode.py — 8 CPU-only unit tests.
  • TP>1 restore cascade flip_worker_restore now tries restore_model_from_ram first (shared page cache → parallel reads per rank), falls back to restore_model_pipelined. Removes the dual O_DIRECT pread contention that regressed 2×H100 restore throughput.

Receipts — 2×H100 SXM

  • Llama-3.1-8B TP=1 (receipt): sleep 3.4s / wake 11.1s, bit-identical, CuMemAllocator freed 45.38 GiB.
  • Llama-3.1-70B TP=2 (receipt): sleep 16.1s (9.04 GB/s aggregate) / wake 53.6s (2.78 GB/s aggregate), 141 GB snapshot across 966 regions, bit-identical, CuMemAllocator freed 72.67 GiB per rank = 145 GiB total. vLLM's own wake_up re-alloc only 0.33s — thaw's restore is the wall-clock.

Install

pip install thaw-vllm==0.5.0 thaw-native>=0.3.1

Requires enable_sleep_mode=True on the LLM() constructor. See demos/sleep_mode_demo.py.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Thaw

Get notified when new releases ship.

Sign up free

Related context

Beta — feedback welcome: [email protected]