Skip to content

LLMKube

v0.7.10 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu
+12 more
self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

ReleasePort's take

Light signal
editorial:auto 11d

Version v0.7.10 introduces the --llama-server-port flag for a fixed runtime port and adds several Foreman‑related features (capability‑aware scheduler, native agent loop) while fixing macOS‑Metal setup bugs.

Why it matters: The new --llama-server-port option lets operators lock the server to a static port; macOS‑Metal users benefit from corrected curl‑port derivation and host‑localhost fallback. All changes land in v0.7.10 released 2026‑05‑23.

Summary

AI summary

Updates 0.7.10, Bug Fixes, and 2026-05-23 across a mixed release.

Changes in this release

Feature Medium

Adds --llama-server-port option for fixed runtime port.

Adds --llama-server-port option for fixed runtime port.

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Adds make lint-all target for cross-architecture linting.

Adds make lint-all target for cross-architecture linting.

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Introduces capability‑aware scheduler, AgenticTaskWatcher, and stub executor (Foreman v0.1 M2).

Introduces capability‑aware scheduler, AgenticTaskWatcher, and stub executor (Foreman v0.1 M2).

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Gates Agent role on a verifier node in Foreman (M4).

Gates Agent role on a verifier node in Foreman (M4).

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Adds native agent loop, Agent CRD, and coder role on M5 Max (Foreman M3).

Adds native agent loop, Agent CRD, and coder role on M5 Max (Foreman M3).

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Scaffolds Foreman as an opt‑in add‑on (M0 + M1).

Scaffolds Foreman as an opt‑in add‑on (M0 + M1).

Source: llm_adapter@2026-05-23

Confidence: high

Feature Medium

Adds AGENTS.md documentation file.

Adds AGENTS.md documentation file.

Source: llm_adapter@2026-05-23

Confidence: low

Bugfix Medium

Reports Stopped phase when InferenceService.spec.replicas=0 on Metal path.

Reports Stopped phase when InferenceService.spec.replicas=0 on Metal path.

Source: llm_adapter@2026-05-23

Confidence: high

Bugfix Medium

Updates broken bartowski phi‑4‑mini URL to renamed repository.

Updates broken bartowski phi‑4‑mini URL to renamed repository.

Source: llm_adapter@2026-05-23

Confidence: low

Bugfix Medium

Derives curl port from Endpoints for macOS‑Metal (follow‑up to #513).

Derives curl port from Endpoints for macOS‑Metal (follow‑up to #513).

Source: llm_adapter@2026-05-23

Confidence: low

Bugfix Medium

Replaces broken port‑forward step with host‑localhost curl for macOS‑Metal.

Replaces broken port‑forward step with host‑localhost curl for macOS‑Metal.

Source: llm_adapter@2026-05-23

Confidence: low

Full changelog

0.7.10 (2026-05-23)

Features

  • add --llama-server-port for a fixed llama-server runtime port (#499) (cc30b0d)
  • add make lint-all target for cross-arch linting (#508) (f57dd5b)
  • capability-aware scheduler + AgenticTaskWatcher + stub executor (Foreman v0.1 M2) (#504) (74b3d6e)
  • foreman: gate-role Agent on a verifier node (M4) (#518) (40a340e)
  • foreman: native agent loop + Agent CRD + coder role on M5 Max (M3) (#509) (6661343)
  • scaffold Foreman as an opt-in add-on (M0 + M1) (#501) (cd40491)

Bug Fixes

  • report Stopped phase when InferenceService.spec.replicas=0 on Metal path (#498) (7787239)

Documentation

  • add AGENTS.md (#496) (89d3766)
  • bump broken bartowski phi-4-mini URL to renamed repo (#514) (9f15d98)
  • macos-metal: derive curl port from Endpoints (follow-up to #513) (#515) (83085c2)
  • macos-metal: replace broken port-forward step with host-localhost curl (#513) (0f7f7a7)

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track LLMKube

Get notified when new releases ship.

Sign up free

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases →

Related context

Earlier breaking changes

  • v0.8.1 foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.

Beta — feedback welcome: [email protected]