LLMKube

v0.7.10 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo Containers & Orchestration

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu

+12 more

self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

ReleasePort's take

Light signal

editorial:auto 2mo

Version v0.7.10 introduces the --llama-server-port flag for a fixed runtime port and adds several Foreman‑related features (capability‑aware scheduler, native agent loop) while fixing macOS‑Metal setup bugs.

Why it matters: The new --llama-server-port option lets operators lock the server to a static port; macOS‑Metal users benefit from corrected curl‑port derivation and host‑localhost fallback. All changes land in v0.7.10 released 2026‑05‑23.

Summary

AI summary

Updates 0.7.10, Bug Fixes, and 2026-05-23 across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Adds --llama-server-port option for fixed runtime port. Adds --llama-server-port option for fixed runtime port. Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Adds make lint-all target for cross-architecture linting. Adds make lint-all target for cross-architecture linting. Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Introduces capability‑aware scheduler, AgenticTaskWatcher, and stub executor (Foreman v0.1 M2). Introduces capability‑aware scheduler, AgenticTaskWatcher, and stub executor (Foreman v0.1 M2). Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Gates Agent role on a verifier node in Foreman (M4). Gates Agent role on a verifier node in Foreman (M4). Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Adds native agent loop, Agent CRD, and coder role on M5 Max (Foreman M3). Adds native agent loop, Agent CRD, and coder role on M5 Max (Foreman M3). Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Scaffolds Foreman as an opt‑in add‑on (M0 + M1). Scaffolds Foreman as an opt‑in add‑on (M0 + M1). Source: llm_adapter@2026-05-23 Confidence: high	—
Feature	Medium	Adds AGENTS.md documentation file. Adds AGENTS.md documentation file. Source: llm_adapter@2026-05-23 Confidence: low	—
Bugfix
Bugfix	Medium	Reports Stopped phase when InferenceService.spec.replicas=0 on Metal path. Reports Stopped phase when InferenceService.spec.replicas=0 on Metal path. Source: llm_adapter@2026-05-23 Confidence: high	—
Bugfix	Medium	Updates broken bartowski phi‑4‑mini URL to renamed repository. Updates broken bartowski phi‑4‑mini URL to renamed repository. Source: llm_adapter@2026-05-23 Confidence: low	—
Bugfix	Medium	Derives curl port from Endpoints for macOS‑Metal (follow‑up to #513). Derives curl port from Endpoints for macOS‑Metal (follow‑up to #513). Source: llm_adapter@2026-05-23 Confidence: low	—
Bugfix	Medium	Replaces broken port‑forward step with host‑localhost curl for macOS‑Metal. Replaces broken port‑forward step with host‑localhost curl for macOS‑Metal. Source: llm_adapter@2026-05-23 Confidence: low	—

Full changelog

0.7.10 (2026-05-23)

Features

add --llama-server-port for a fixed llama-server runtime port (#499) (cc30b0d)
add make lint-all target for cross-arch linting (#508) (f57dd5b)
capability-aware scheduler + AgenticTaskWatcher + stub executor (Foreman v0.1 M2) (#504) (74b3d6e)
foreman: gate-role Agent on a verifier node (M4) (#518) (40a340e)
foreman: native agent loop + Agent CRD + coder role on M5 Max (M3) (#509) (6661343)
scaffold Foreman as an opt-in add-on (M0 + M1) (#501) (cd40491)

Bug Fixes

report Stopped phase when InferenceService.spec.replicas=0 on Metal path (#498) (7787239)

Documentation

add AGENTS.md (#496) (89d3766)
bump broken bartowski phi-4-mini URL to renamed repo (#514) (9f15d98)
macos-metal: derive curl port from Endpoints (follow-up to #513) (#515) (83085c2)
macos-metal: replace broken port-forward step with host-localhost curl (#513) (0f7f7a7)

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track LLMKube

Get notified when new releases ship.

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.1 foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.