Skip to content

LLMKube

v0.8.1 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu
+12 more
self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

ReleasePort's take

Moderate signal
editorial:auto 2d

In foreman v0.8.1 the requestTimeoutSeconds field now governs a loop‑wide budget and its default shifts from 600 to 3600 seconds.

Why it matters: The change alters timeout behavior for all agents; operators must review configurations because the new default extends timeouts by six times.

Summary

AI summary

Updates Bug Fixes, ⚠ BREAKING CHANGES, and Miscellaneous across a mixed release.

Changes in this release

Breaking High

foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.

foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.

Source: llm_adapter@2026-06-01

Confidence: low

Feature Medium

inferenceservice: adds typed spec.ropeScaling for RoPE/YaRN context extension.

inferenceservice: adds typed spec.ropeScaling for RoPE/YaRN context extension.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

foreman: recovers orphaned phase=Running tasks on agent restart.

foreman: recovers orphaned phase=Running tasks on agent restart.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

foreman: splits per‑turn timeout from loop‑wide budget.

foreman: splits per‑turn timeout from loop‑wide budget.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

foreman: adds warm‑path reviewer scheduling on macOS.

foreman: adds warm‑path reviewer scheduling on macOS.

Source: llm_adapter@2026-06-01

Confidence: high

Bugfix Medium

metal-agent: prefers routable interface for host‑IP auto‑detect.

metal-agent: prefers routable interface for host‑IP auto‑detect.

Source: llm_adapter@2026-06-01

Confidence: high

Full changelog

0.8.1 (2026-06-01)

⚠ BREAKING CHANGES

  • foreman: Agent.spec.requestTimeoutSeconds changes meaning from a per-request HTTP timeout to a loop-wide wall-clock budget, and its default moves from 600 to 3600. The former per-request bound is now the new Agent.spec.requestTurnTimeoutSeconds (default 120). Re-apply your Agent CRs after upgrade so existing Agents pick up explicit values.

Features

  • inferenceservice: typed spec.ropeScaling for RoPE/YaRN context extension (#507) (#600) (a554aee)

Bug Fixes

  • foreman: recover orphaned phase=Running tasks on agent restart (#542) (#598) (6dd2c44)
  • foreman: split per-turn timeout from loop-wide budget (#532) (#602) (41e7663)
  • foreman: warm-path reviewer scheduling on macOS (#578, #579) (#597) (a94d1ef)
  • metal-agent: prefer routable interface for host-IP auto-detect (#526) (#599) (c780795)

Documentation

  • foreman: absolute paths in overview README cross-refs (fix llmkube-web prerender) (#596) (b5f6f94)
  • foreman: move docs/foreman to docs/site/foreman + register in site nav (#594) (9fd85bb)

Miscellaneous

  • pin next release to 0.8.1 (Release-As) (#605) (a876cc6)

Breaking Changes

  • Agent.spec.requestTimeoutSeconds now represents a loop-wide wall-clock budget (default 3600) instead of per-request HTTP timeout; the former behavior is moved to Agent.spec.requestTurnTimeoutSeconds (default 120). Re‑apply Agent CRs after upgrade.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track LLMKube

Get notified when new releases ship.

Sign up free

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases →

Related context

Beta — feedback welcome: [email protected]