This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+1 more
Summary
AI summaryUpdates Caveats, Correctness, and What changed across a mixed release.
Full changelog
perf: parallel CRC32C — restore 2.89× faster
This release lands a two-edit perf pass on crates/thaw-runtime/src/pipeline.rs that makes the pipelined restore path CRC-parallel and decouples the CRC fold from the pread path. CRC content integrity is still fully verified; THAW_VERIFY=0 still bypasses.
Throughput (16 GiB synthetic, 1× H100 SXM, warm page cache)
| Operation | Before | After | Speedup |
|---|---|---|---|
| restore_from_file_pipelined (chunk=64) | 4.80 GB/s | 13.88 GB/s | 2.89× |
| restore_from_file_pipelined (chunk=256) | 4.51 GB/s | 14.56 GB/s | 3.23× |
| restore_from_file_pipelined (chunk=1024) | — | 15.47 GB/s | — |
| freeze_to_file_pipelined (chunk=64) | 3.32 GB/s | 3.88 GB/s | 1.17× (NVMe-bound) |
| restore_from_bytes_pipelined (mmap) | 1.43 GB/s | 1.64 GB/s | 1.15× (PTE-walk-bound) |
What changed
- New helper
crc32c_append_parallel— splits slices ≥4 MiB into 8 shards, computescrc32con each viastd::thread::scope, then stitches withcrc32c_combine. Serial fallback for smaller slices avoids spawn overhead. - Steady-state loop in
restore_pipelinedreordered —launch_uploads(prev)fires first; the CRC fold for the previous chunk runs on a worker thread in parallel withread_chunk(curr)on the main thread. Critical path becomesmax(CRC, pread)instead ofCRC + pread. - Call-site swap —
accumulate_freeze_crcs,fold_and_verify_chunk_crc, andverify_plan_crcs_from_bytesall use the parallel CRC, so freeze, pipelined restore, and in-memory restore all benefit.
No public API change.
Correctness
- All 76
thaw-runtimeunit tests pass, including:restore_detects_single_byte_payload_corruptionpipelined_from_bytes_detects_payload_corruptionrestore_round_trips_through_freeze
- Manual bit-identity round-trip on 2 GiB of f32 random data: pass.
Caveats
- Warm-path measurement. This pod has 2 TB RAM; the 16 GiB file lives in page cache after freeze. Cold O_DIRECT throughput on this pod caps at ~2.5 GB/s single-stream / ~8.4 GB/s parallel 4× — pod-specific, not code-bound.
- Freeze bump is small because NVMe write is the ceiling on this pod (
ddcaps at 3.1 GB/s).
Install
pip install --upgrade thaw-vllm thaw-native
thaw-native0.3.2 (Rust runtime — the perf change lives here)thaw-vllm0.5.1 (Python integration — pin bump + version sync)
Reproduce
python3 benchmarks/micro_pipeline.py --size-gb 16 --chunk-mb 256 --repeats 3 --path /tmp/micro.thaw
Still on the table
- Persistent CRC worker (one spawn per restore instead of one per chunk).
- Multi-queue pread ring (unlocks parallel NVMe reads on cold path).
- AVX-512/PCLMUL CRC32C (crates that double single-core CRC).
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Thaw
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]