LLMKube

v0.8.1 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 1mo Containers & Orchestration

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu

+12 more

self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

ReleasePort's take

Moderate signal

editorial:auto 1mo

In foreman v0.8.1 the requestTimeoutSeconds field now governs a loop‑wide budget and its default shifts from 600 to 3600 seconds.

Why it matters: The change alters timeout behavior for all agents; operators must review configurations because the new default extends timeouts by six times.

Summary

AI summary

Updates Bug Fixes, ⚠ BREAKING CHANGES, and Miscellaneous across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Breaking	High	foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600. foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600. Source: llm_adapter@2026-06-01 Confidence: low	—
Feature	Medium	inferenceservice: adds typed spec.ropeScaling for RoPE/YaRN context extension. inferenceservice: adds typed spec.ropeScaling for RoPE/YaRN context extension. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix
Bugfix	Medium	foreman: recovers orphaned phase=Running tasks on agent restart. foreman: recovers orphaned phase=Running tasks on agent restart. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Medium	foreman: splits per‑turn timeout from loop‑wide budget. foreman: splits per‑turn timeout from loop‑wide budget. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Medium	foreman: adds warm‑path reviewer scheduling on macOS. foreman: adds warm‑path reviewer scheduling on macOS. Source: llm_adapter@2026-06-01 Confidence: high	—
Bugfix	Medium	metal-agent: prefers routable interface for host‑IP auto‑detect. metal-agent: prefers routable interface for host‑IP auto‑detect. Source: llm_adapter@2026-06-01 Confidence: high	—

Full changelog

0.8.1 (2026-06-01)

⚠ BREAKING CHANGES

foreman: Agent.spec.requestTimeoutSeconds changes meaning from a per-request HTTP timeout to a loop-wide wall-clock budget, and its default moves from 600 to 3600. The former per-request bound is now the new Agent.spec.requestTurnTimeoutSeconds (default 120). Re-apply your Agent CRs after upgrade so existing Agents pick up explicit values.

Features

inferenceservice: typed spec.ropeScaling for RoPE/YaRN context extension (#507) (#600) (a554aee)

Bug Fixes

foreman: recover orphaned phase=Running tasks on agent restart (#542) (#598) (6dd2c44)
foreman: split per-turn timeout from loop-wide budget (#532) (#602) (41e7663)
foreman: warm-path reviewer scheduling on macOS (#578, #579) (#597) (a94d1ef)
metal-agent: prefer routable interface for host-IP auto-detect (#526) (#599) (c780795)

Documentation

foreman: absolute paths in overview README cross-refs (fix llmkube-web prerender) (#596) (b5f6f94)
foreman: move docs/foreman to docs/site/foreman + register in site nav (#594) (9fd85bb)

Miscellaneous

pin next release to 0.8.1 (Release-As) (#605) (a876cc6)

Breaking Changes

Agent.spec.requestTimeoutSeconds now represents a loop-wide wall-clock budget (default 3600) instead of per-request HTTP timeout; the former behavior is moved to Agent.spec.requestTurnTimeoutSeconds (default 120). Re‑apply Agent CRs after upgrade.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track LLMKube

Get notified when new releases ship.

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases →