This release adds 2 notable features for engineering teams evaluating rollout.
Published 16d
Containers & Orchestration
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
ai
apple-silicon
autoscaling
edge-computing
gguf
gpu
+12 more
self-hosted
inference
kubernetes
llama-cpp
llm
local-llm
metal
mlx
multi-gpu
nvidia
tgi
vllm
Summary
AI summaryAdded mlx-server runtime to metal-agent and introduced a scale sub resource.
Full changelog
0.7.9 (2026-05-18)
Features
Bug Fixes
- clear stale conditions when a model reaches Ready without a download (#476) (06325b0)
- inference PodMonitor selector matched no pods (#481) (31ee4d6)
- mark Metal local-path models Ready instead of stuck Copying (#472) (c513c84)
- metal-path InferenceService status and memory pre-flight (#488) (98ef2c4)
- point metal-agent mlx-server install hint at the Homebrew formula (#477) (74b3333)
- prevent concurrent runtime respawn in metal-agent (#469) (f34640b)
- stop the operator fighting the HPA over Deployment replicas (#485) (8fc70e2)
Documentation
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About LLMKube
Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.
Beta — feedback welcome: [email protected]