Skip to content

LLMKube

Containers & Orchestration

A Kubernetes operator that simplifies self‑hosted LLM inference, turning deployment into a two‑line YAML problem.

Go Latest llmkube-0.8.1 · 2d ago Security brief →

Features

  • Runs LLMs as native Kubernetes workloads (llama.cpp, vLLM, etc.) with automatic GPU scheduling
  • Provides an OpenAI‑compatible API for seamless integration with existing SDKs and frameworks
  • Supports cross‑engine routing via ModelRouter for policy‑aware fallback to external providers (Anthropic, OpenAI, LiteLLM)
  • Optional metal‑agent enables Apple Silicon (Metal) GPU support alongside Linux/NVIDIA GPUs

Recent releases

View all 89 releases →
No immediate action
foreman-0.8.1 Feature

Foreman workload scheduler

v0.8.1 Breaking risk
⚠ Upgrade required
  • After upgrading to v0.8.1, re‑apply all Agent CRs so existing Agents pick up explicit values for the new requestTimeoutSeconds and requestTurnTimeoutSeconds fields.
Breaking changes
  • Agent.spec.requestTimeoutSeconds now represents a loop-wide wall-clock budget (default 3600) instead of per-request HTTP timeout; the former behavior is moved to Agent.spec.requestTurnTimeoutSeconds (default 120). Re‑apply Agent CRs after upgrade.
Notable features
  • **inferenceservice:** adds typed spec.ropeScaling for RoPE/YaRN context extension
Full changelog

0.8.1 (2026-06-01)

⚠ BREAKING CHANGES

  • foreman: Agent.spec.requestTimeoutSeconds changes meaning from a per-request HTTP timeout to a loop-wide wall-clock budget, and its default moves from 600 to 3600. The former per-request bound is now the new Agent.spec.requestTurnTimeoutSeconds (default 120). Re-apply your Agent CRs after upgrade so existing Agents pick up explicit values.

Features

  • inferenceservice: typed spec.ropeScaling for RoPE/YaRN context extension (#507) (#600) (a554aee)

Bug Fixes

  • foreman: recover orphaned phase=Running tasks on agent restart (#542) (#598) (6dd2c44)
  • foreman: split per-turn timeout from loop-wide budget (#532) (#602) (41e7663)
  • foreman: warm-path reviewer scheduling on macOS (#578, #579) (#597) (a94d1ef)
  • metal-agent: prefer routable interface for host-IP auto-detect (#526) (#599) (c780795)

Documentation

  • foreman: absolute paths in overview README cross-refs (fix llmkube-web prerender) (#596) (b5f6f94)
  • foreman: move docs/foreman to docs/site/foreman + register in site nav (#594) (9fd85bb)

Miscellaneous

  • pin next release to 0.8.1 (Release-As) (#605) (a876cc6)
No immediate action
foreman-0.8.0 Feature

Agentic workload scheduler

No immediate action
v0.8.0 New feature

Taxonomy, masking, Intel GPU

No immediate action
v0.7.12 Mixed

Workload reconciler + job fixes + doc improvements

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
118
Forks
17
Languages
Go Shell HCL

Install & Platforms

Install via
brew helm
Platforms
linux macos arm64

Community & Support

Beta — feedback welcome: [email protected]