This release adds 3 notable features for engineering teams evaluating rollout.
Published 2mo
AI Agents & Assistants
✓ No known CVEs patched
✓ No known CVEs patched in this version
Summary
AI summaryVRAM‑aware model routing and a priority task queue with preemption were introduced.
Full changelog
What's New
VRAM-Aware Model Routing
agents/model_manager.py—ModelManagerpolls Ollama's/api/psfor loaded model state and recommends models with cache affinity: if the preferred model is already in VRAM, route to it (zero eviction cost). Evicts LRU non-pinned models when VRAM is needed.- Eviction policies:
lru(default),pinned(never evict),background(evict first). - Events emitted:
model.loaded,model.evicted,vram.pressure(>90% VRAM used).
Priority Task Queue
ThreadPoolExecutorreplaced withPriorityTaskQueue(min-heap).- Priority levels:
0=URGENT,1=NORMAL(default),2=BACKGROUND. - URGENT preemption: if all workers busy with BACKGROUND tasks, the oldest BACKGROUND task is checkpointed (re-queued) to free a worker for the URGENT task. Emits
task.checkpointed.
New API
GET /model_status— VRAM state: loaded models, VRAM used/available/total (MB), pressure flag, queue depth by priority. No Ollama required (returns empty state when Ollama offline).POST /tasks/submit— newpriorityfield (0/1/2).
New MCP Tool
model_status— exposes VRAM state to Claude Code / any MCP agent. 51 tools total.
Integration Tests
tests/integration/test_vram_scheduler.py— 6 tests covering affinity, URGENT preemption, VRAM eviction, throughput regression,model_statusschema, andvram.pressureevent. Ollama-dependent tests auto-skipped in CI.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Hollow
All releases →Related context
Related tools
Earlier breaking changes
- v5.7.32 Web dashboard removed; operator panel is canonical UI
Beta — feedback welcome: [email protected]