Thaw

v0.1.2 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agents inference kv-cache llm reinforcement-learning sglang

+1 more

vllm

Summary

AI summary

Updates What's New, SSE, and safetensors across a mixed release.

Full changelog

What's New

`thaw serve` — Pre-warmed Engine Pool

PgBouncer for GPU inference. Keep vLLM engines pre-initialized with dummy weights, then DMA-swap model
snapshots on demand (~1s instead of 20s cold start).

OpenAI-compatible API (/v1/completions, /v1/chat/completions)
Model affinity — zero swap cost when the requested model is already loaded
Hot model registration via admin API (/admin/pool, /admin/snapshots)
Streaming support (SSE)

Pre-built native wheels

thaw-native is now published to PyPI with CUDA 12.4 baked in. No more Rust toolchain on your GPU box:
pip install thaw-vllm[all]

Pure-Python restore fallback

restore_model_from_ram now copies region-by-region into existing GPU tensors when the Rust module
isn't available. No extra GPU memory allocation (fixes OOM on the previous fallback path).

Benchmarks (Llama-3.1-8B, A40)

| Metric | Value |
|--------|-------|
| Full cold start (safetensors) | 45.7s |
| thaw serve ready | 14.2s |
| DMA restore throughput | 11.6 GB/s |
| Weight restore time | 7.7s |

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Thaw

Get notified when new releases ship.

About Thaw

All releases →

Thaw

Summary

What's New

`thaw serve` — Pre-warmed Engine Pool

Pre-built native wheels

Pure-Python restore fallback

Benchmarks (Llama-3.1-8B, A40)

Related context

Related tools

Thaw

Summary

What's New

thaw serve — Pre-warmed Engine Pool

Pre-built native wheels

Pure-Python restore fallback

Benchmarks (Llama-3.1-8B, A40)

Related context

Related tools

`thaw serve` — Pre-warmed Engine Pool