Thaw

v0.5.1 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents inference kv-cache llm reinforcement-learning sglang

+1 more

vllm

Summary

AI summary

Updates Caveats, Correctness, and What changed across a mixed release.

Full changelog

perf: parallel CRC32C — restore 2.89× faster

This release lands a two-edit perf pass on crates/thaw-runtime/src/pipeline.rs that makes the pipelined restore path CRC-parallel and decouples the CRC fold from the pread path. CRC content integrity is still fully verified; THAW_VERIFY=0 still bypasses.

Throughput (16 GiB synthetic, 1× H100 SXM, warm page cache)

| Operation | Before | After | Speedup |
|---|---|---|---|
| restore_from_file_pipelined (chunk=64) | 4.80 GB/s | 13.88 GB/s | 2.89× |
| restore_from_file_pipelined (chunk=256) | 4.51 GB/s | 14.56 GB/s | 3.23× |
| restore_from_file_pipelined (chunk=1024) | — | 15.47 GB/s | — |
| freeze_to_file_pipelined (chunk=64) | 3.32 GB/s | 3.88 GB/s | 1.17× (NVMe-bound) |
| restore_from_bytes_pipelined (mmap) | 1.43 GB/s | 1.64 GB/s | 1.15× (PTE-walk-bound) |

What changed

New helper crc32c_append_parallel — splits slices ≥4 MiB into 8 shards, computes crc32c on each via std::thread::scope, then stitches with crc32c_combine. Serial fallback for smaller slices avoids spawn overhead.
Steady-state loop in restore_pipelined reordered — launch_uploads(prev) fires first; the CRC fold for the previous chunk runs on a worker thread in parallel with read_chunk(curr) on the main thread. Critical path becomes max(CRC, pread) instead of CRC + pread.
Call-site swap — accumulate_freeze_crcs, fold_and_verify_chunk_crc, and verify_plan_crcs_from_bytes all use the parallel CRC, so freeze, pipelined restore, and in-memory restore all benefit.

No public API change.

Correctness

All 76 thaw-runtime unit tests pass, including:
- restore_detects_single_byte_payload_corruption
- pipelined_from_bytes_detects_payload_corruption
- restore_round_trips_through_freeze
Manual bit-identity round-trip on 2 GiB of f32 random data: pass.

Caveats

Warm-path measurement. This pod has 2 TB RAM; the 16 GiB file lives in page cache after freeze. Cold O_DIRECT throughput on this pod caps at ~2.5 GB/s single-stream / ~8.4 GB/s parallel 4× — pod-specific, not code-bound.
Freeze bump is small because NVMe write is the ceiling on this pod (dd caps at 3.1 GB/s).

Install

pip install --upgrade thaw-vllm thaw-native

thaw-native 0.3.2 (Rust runtime — the perf change lives here)
thaw-vllm 0.5.1 (Python integration — pin bump + version sync)

Reproduce

python3 benchmarks/micro_pipeline.py --size-gb 16 --chunk-mb 256 --repeats 3 --path /tmp/micro.thaw

Still on the table

Persistent CRC worker (one spawn per restore instead of one per chunk).
Multi-queue pread ring (unlocks parallel NVMe reads on cold path).
AVX-512/PCLMUL CRC32C (crates that double single-core CRC).

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Thaw

Get notified when new releases ship.

About Thaw

All releases →