Hollow

v0.9.0 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo AI Agents & Assistants

✓ No known CVEs patched

✓ No known CVEs patched in this version

Summary

AI summary

VRAM‑aware model routing and a priority task queue with preemption were introduced.

Full changelog

agents/model_manager.py — ModelManager polls Ollama's /api/ps for loaded model state and recommends models with cache affinity: if the preferred model is already in VRAM, route to it (zero eviction cost). Evicts LRU non-pinned models when VRAM is needed.
Eviction policies: lru (default), pinned (never evict), background (evict first).
Events emitted: model.loaded, model.evicted, vram.pressure (>90% VRAM used).

ThreadPoolExecutor replaced with PriorityTaskQueue (min-heap).
Priority levels: 0=URGENT, 1=NORMAL (default), 2=BACKGROUND.
URGENT preemption: if all workers busy with BACKGROUND tasks, the oldest BACKGROUND task is checkpointed (re-queued) to free a worker for the URGENT task. Emits task.checkpointed.

GET /model_status — VRAM state: loaded models, VRAM used/available/total (MB), pressure flag, queue depth by priority. No Ollama required (returns empty state when Ollama offline).
POST /tasks/submit — new priority field (0/1/2).

model_status — exposes VRAM state to Claude Code / any MCP agent. 51 tools total.

tests/integration/test_vram_scheduler.py — 6 tests covering affinity, URGENT preemption, VRAM eviction, throughput regression, model_status schema, and vram.pressure event. Ollama-dependent tests auto-skipped in CI.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Hollow

Get notified when new releases ship.

About Hollow