LLMKube

v0.7.9 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 2mo Containers & Orchestration

✓ No known CVEs patched

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu

+12 more

self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

Summary

AI summary

Added mlx-server runtime to metal-agent and introduced a scale sub resource.

Full changelog

clear stale conditions when a model reaches Ready without a download (#476) (06325b0)
inference PodMonitor selector matched no pods (#481) (31ee4d6)
mark Metal local-path models Ready instead of stuck Copying (#472) (c513c84)
metal-path InferenceService status and memory pre-flight (#488) (98ef2c4)
point metal-agent mlx-server install hint at the Homebrew formula (#477) (74b3333)
prevent concurrent runtime respawn in metal-agent (#469) (f34640b)
stop the operator fighting the HPA over Deployment replicas (#485) (8fc70e2)

add MAINTAINERS file and recommend private vulnerability reporting (#479) (aaccb4d)

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track LLMKube

Get notified when new releases ship.

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

v0.8.1 foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.