Skip to content

Thaw

v0.5.1 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents inference kv-cache llm reinforcement-learning sglang
+1 more
vllm

Summary

AI summary

Updates Caveats, Correctness, and What changed across a mixed release.

Full changelog

perf: parallel CRC32C — restore 2.89× faster

This release lands a two-edit perf pass on crates/thaw-runtime/src/pipeline.rs that makes the pipelined restore path CRC-parallel and decouples the CRC fold from the pread path. CRC content integrity is still fully verified; THAW_VERIFY=0 still bypasses.

Throughput (16 GiB synthetic, 1× H100 SXM, warm page cache)

| Operation | Before | After | Speedup |
|---|---|---|---|
| restore_from_file_pipelined (chunk=64) | 4.80 GB/s | 13.88 GB/s | 2.89× |
| restore_from_file_pipelined (chunk=256) | 4.51 GB/s | 14.56 GB/s | 3.23× |
| restore_from_file_pipelined (chunk=1024) | — | 15.47 GB/s | — |
| freeze_to_file_pipelined (chunk=64) | 3.32 GB/s | 3.88 GB/s | 1.17× (NVMe-bound) |
| restore_from_bytes_pipelined (mmap) | 1.43 GB/s | 1.64 GB/s | 1.15× (PTE-walk-bound) |

What changed

  • New helper crc32c_append_parallel — splits slices ≥4 MiB into 8 shards, computes crc32c on each via std::thread::scope, then stitches with crc32c_combine. Serial fallback for smaller slices avoids spawn overhead.
  • Steady-state loop in restore_pipelined reorderedlaunch_uploads(prev) fires first; the CRC fold for the previous chunk runs on a worker thread in parallel with read_chunk(curr) on the main thread. Critical path becomes max(CRC, pread) instead of CRC + pread.
  • Call-site swapaccumulate_freeze_crcs, fold_and_verify_chunk_crc, and verify_plan_crcs_from_bytes all use the parallel CRC, so freeze, pipelined restore, and in-memory restore all benefit.

No public API change.

Correctness

  • All 76 thaw-runtime unit tests pass, including:
    • restore_detects_single_byte_payload_corruption
    • pipelined_from_bytes_detects_payload_corruption
    • restore_round_trips_through_freeze
  • Manual bit-identity round-trip on 2 GiB of f32 random data: pass.

Caveats

  • Warm-path measurement. This pod has 2 TB RAM; the 16 GiB file lives in page cache after freeze. Cold O_DIRECT throughput on this pod caps at ~2.5 GB/s single-stream / ~8.4 GB/s parallel 4× — pod-specific, not code-bound.
  • Freeze bump is small because NVMe write is the ceiling on this pod (dd caps at 3.1 GB/s).

Install

pip install --upgrade thaw-vllm thaw-native
  • thaw-native 0.3.2 (Rust runtime — the perf change lives here)
  • thaw-vllm 0.5.1 (Python integration — pin bump + version sync)

Reproduce

python3 benchmarks/micro_pipeline.py --size-gb 16 --chunk-mb 256 --repeats 3 --path /tmp/micro.thaw

Still on the table

  • Persistent CRC worker (one spawn per restore instead of one per chunk).
  • Multi-queue pread ring (unlocks parallel NVMe reads on cold path).
  • AVX-512/PCLMUL CRC32C (crates that double single-core CRC).

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Thaw

Get notified when new releases ship.

Sign up free

Related context

Beta — feedback welcome: [email protected]