LLMKube releases - releaseport

No immediate action

llmkube-0.9.12 Maintenance 8h

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.12 Feature 8h

Opt-in Foreman add-on

Open

No immediate action

v0.9.12 Mixed 8h

Bug fixes + new features

Open

No immediate action

llmkube-0.9.11 Maintenance 2d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.11 Feature 2d

Foreman install sequence

Open

No immediate action

v0.9.11 Mixed 2d

Cache prep + federation + foreman matching

Open

No immediate action

llmkube-0.9.10 Maintenance 4d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.10 Feature 4d

Agentic workload scheduling

Open

Review required

v0.9.10 Mixed 4d

Dependencies

grafana variable + deps bump + foreman fix

Open

No immediate action

llmkube-0.9.9 Maintenance 4d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.9 Breaking 4d

Manual Foreman sequencing

Open

Upgrade now

v0.9.9 New feature 4d

Dependencies

gpuSharing tiers + GPUQuota + optional verify

Open

No immediate action

llmkube-0.9.8 Maintenance 7d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.8 Feature 7d

Agentic workload scheduler

Open

No immediate action

v0.9.8 New feature 7d

hardware-labels + llamacpp-router + vLLM entrypoint

Open

No immediate action

llmkube-0.9.7 Maintenance 10d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.7 Feature 10d

Foreman scheduler

Open

No immediate action

v0.9.7 Mixed 10d

envtest verification + budget engine + agent loop fix

Open

No immediate action

llmkube-0.9.6 Maintenance 12d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.6 Maintenance 12d

LLMKube prerequisite

Open

No immediate action

v0.9.6 New feature 12d

GPUQuota CRD + s3 model source + branch preservation

Open

No immediate action

llmkube-0.9.5 Maintenance 14d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.5 Breaking 14d

Separate Foreman install + RBAC

Open

No immediate action

v0.9.5 New feature 14d

v0.9.5 config validation + planner token

Open

No immediate action

llmkube-0.9.4 Maintenance 15d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.4 Feature 15d

Opt-in agentic workload scheduler

Open

No immediate action

v0.9.4 Mixed 15d

Multi‑fleet charts + SGLang runtime

Open

No immediate action

llmkube-0.9.3 Maintenance 17d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.3 Breaking 17d

LLMKube then Foreman install

Open

No immediate action

v0.9.3 Bug fix 17d

Fixes invalid flag

Open

No immediate action

llmkube-0.9.2 Maintenance 17d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.2 Maintenance 17d

LLMKube prerequisite

Open

No immediate action

v0.9.2 Mixed 17d

Foreman bug fixes & features

Open

No immediate action

llmkube-0.9.1 Maintenance 20d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.1 Feature 20d

Foreman scheduling add‑on

Open

Config change

v0.9.1 Mixed 20d

Breaking upgrade

CRD toggle + foreman improvements + unified cache-key

Open

No immediate action

llmkube-0.9.0 Maintenance 21d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.9.0 Feature 21d

Manual install after LLMKube

Open

Security behavior changed

v0.9.0 Breaking risk 21d

Auth RBAC RCE / SSRF

SSRF gate + hostPath block

Open

No immediate action

llmkube-0.8.28 Maintenance 23d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.28 Feature 23d

Sibling install requirement

Open

No immediate action

v0.8.28 New feature 23d

Foreman features + inference cache

Open

No immediate action

llmkube-0.8.27 Maintenance 24d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.27 Feature 24d

Agentic workload scheduler

Open

No immediate action

v0.8.27 Bug fix 24d

foreman bug fixes

Open

No immediate action

llmkube-0.8.26 Maintenance 25d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.26 Feature 25d

Foreman installation steps

Open

No immediate action

v0.8.26 New feature 25d

two‑tier gate‑check suite

Open

No immediate action

v0.8.25 New feature 25d

Task-specific repos

Open

No immediate action

v0.8.24 Bug fix 25d

SecurityContext preservation fix

Open

No immediate action

llmkube-0.8.23 Maintenance 25d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.23 Feature 25d

Foreman install requirements

Open

No immediate action

v0.8.23 New feature 25d

Controller deferral + Foreman fixes + Cache improvements

Open

No immediate action

llmkube-0.8.22 Maintenance 27d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.22 Feature 27d

Opt-in Foreman add-on

Open

No immediate action

v0.8.22 Mixed 27d

Foreman bugfixes + mutation check

Open

No immediate action

llmkube-0.8.21 Maintenance 28d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.21 Feature 28d

Foreman scheduling

Open

No immediate action

v0.8.21 Bug fix 28d

Root init fix

Open

No immediate action

llmkube-0.8.20 Maintenance 28d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.20 Feature 28d

Agentic workload scheduling

Open

No immediate action

v0.8.20 Mixed 28d

spec.mode, vulkan routing, memory limit

Open

No immediate action

llmkube-0.8.19 Maintenance 29d

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.19 Feature 29d

Agentic workload scheduler

Open

No immediate action

v0.8.19 Mixed 29d

foreman features + controller fix

Open

No immediate action

llmkube-0.8.18 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.18 Breaking 1mo

Manual install after LLMKube

Open

No immediate action

v0.8.18 New feature 1mo

Model artifacts + Router metrics + Config propagation

Open

No immediate action

llmkube-0.8.17 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.17 Feature 1mo

Foreman installation requirements

Open

No immediate action

v0.8.17 New feature 1mo

Model-cache flags + audit log + speculative decoding

Open

No immediate action

llmkube-0.8.16 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.16 Feature 1mo

Foreman add‑on

Open

No immediate action

v0.8.16 Bug fix 1mo

BTP timeout fix, foreman retries, readme update

Open

No immediate action

llmkube-0.8.15 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.15 Feature 1mo

Agentic workload scheduler

Open

No immediate action

v0.8.15 Bug fix 1mo

Foreman gate fix + metric removal

Open

No immediate action

llmkube-0.8.14 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.14 Feature 1mo

Foreman add‑on

Open

No immediate action

v0.8.14 New feature 1mo

ModelRouter strategy + InferenceService constraints + Foreman gate

Open

No immediate action

llmkube-0.8.13 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.13 Feature 1mo

Foreman scheduling

Open

No immediate action

v0.8.13 Bug fix 1mo

Llamacpp alias registration

Open

No immediate action

llmkube-0.8.12 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.12 Breaking 1mo

LLMKube installation prerequisite

Open

No immediate action

v0.8.12 Mixed 1mo

Heartbeat download + cache awareness + pod shielding

Open

No immediate action

v0.8.11 Mixed 1mo

OCI charts + foreman features + fixes

Open

No immediate action

llmkube-0.8.10 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.10 Maintenance 1mo

Foreman install process

Open

No immediate action

v0.8.10 Bug fix 1mo

GPU CRD fix

Open

No immediate action

llmkube-0.8.9 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.9 Breaking 1mo

Separate Foreman install

Open

No immediate action

v0.8.9 New feature 1mo

Vulkan check, node port, scaling

Open

No immediate action

llmkube-0.8.8 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.8 Feature 1mo

Foreman scheduling add‑on

Open

Review required

v0.8.8 New feature 1mo

Auth RBAC Breaking upgrade

Runtime selection + Gateway improvements + Cache fixes

Open

No immediate action

llmkube-0.8.7 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.7 Feature 1mo

Opt‑in agentic workload scheduler

Open

No immediate action

v0.8.7 Breaking risk 1mo

Removed Result.Requeue

Open

No immediate action

v0.8.6 Bug fix 1mo

Symlink handling fix

Open

No immediate action

v0.8.5 New feature 1mo

Foreman agent version report + self‑update + rollout

Open

No immediate action

llmkube-0.8.4 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.4 Maintenance 1mo

Separate Foreman install

Open

No immediate action

v0.8.4 Mixed 1mo

Heartbeat liveness + bug fixes

Open

No immediate action

llmkube-0.8.3 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.3 Maintenance 1mo

Sibling chart install order

Open

No immediate action

v0.8.3 Mixed 1mo

Hybrid thinking + demote verdicts + map exhaustion

Open

No immediate action

llmkube-0.8.2 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.2 Feature 1mo

Opt-in workload scheduler

Open

Upgrade now

v0.8.2 Mixed 1mo

Dependencies

Model policy, foreman jobs, Go CVE fix

Open

No immediate action

llmkube-0.8.1 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.1 Feature 1mo

Foreman workload scheduler

Open

v0.8.1 Breaking risk 1mo

⚠ Upgrade required

After upgrading to v0.8.1, re‑apply all Agent CRs so existing Agents pick up explicit values for the new requestTimeoutSeconds and requestTurnTimeoutSeconds fields.

Breaking changes

Agent.spec.requestTimeoutSeconds now represents a loop-wide wall-clock budget (default 3600) instead of per-request HTTP timeout; the former behavior is moved to Agent.spec.requestTurnTimeoutSeconds (default 120). Re‑apply Agent CRs after upgrade.

Notable features

**inferenceservice:** adds typed spec.ropeScaling for RoPE/YaRN context extension

Full changelog

0.8.1 (2026-06-01)

⚠ BREAKING CHANGES

foreman: Agent.spec.requestTimeoutSeconds changes meaning from a per-request HTTP timeout to a loop-wide wall-clock budget, and its default moves from 600 to 3600. The former per-request bound is now the new Agent.spec.requestTurnTimeoutSeconds (default 120). Re-apply your Agent CRs after upgrade so existing Agents pick up explicit values.

Features

inferenceservice: typed spec.ropeScaling for RoPE/YaRN context extension (#507) (#600) (a554aee)

Bug Fixes

foreman: recover orphaned phase=Running tasks on agent restart (#542) (#598) (6dd2c44)
foreman: split per-turn timeout from loop-wide budget (#532) (#602) (41e7663)
foreman: warm-path reviewer scheduling on macOS (#578, #579) (#597) (a94d1ef)
metal-agent: prefer routable interface for host-IP auto-detect (#526) (#599) (c780795)

Documentation

foreman: absolute paths in overview README cross-refs (fix llmkube-web prerender) (#596) (b5f6f94)
foreman: move docs/foreman to docs/site/foreman + register in site nav (#594) (9fd85bb)

Miscellaneous

pin next release to 0.8.1 (Release-As) (#605) (a876cc6)

View release on GitHub

No immediate action

llmkube-0.8.0 Maintenance 1mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.8.0 Feature 1mo

Agentic workload scheduler

Open

No immediate action

v0.8.0 New feature 1mo

Taxonomy, masking, Intel GPU

Open

No immediate action

llmkube-0.7.12 Maintenance 2mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.7.12 Maintenance 2mo

Manual Foreman install

Open

No immediate action

v0.7.12 Mixed 2mo

Workload reconciler + job fixes + doc improvements

Open

No immediate action

llmkube-0.7.11 Maintenance 2mo

Routine maintenance and dependency updates.

Open

No immediate action

foreman-0.7.11 Feature 2mo

Foreman opt-in add-on

Open

No immediate action

v0.7.11 Bug fix 2mo

Foreman chart fix

Open

No immediate action

v0.7.10 New feature 2mo

--llama-server-port + lint-all + scheduler

Open

No immediate action

llmkube-0.7.9 Maintenance 2mo

Routine maintenance and dependency updates.

Open

No immediate action

v0.7.9 New feature 2mo

mlx-server runtime + scale subresource

Open

No immediate action

llmkube-0.7.8 Maintenance 2mo

Routine maintenance and dependency updates.

Open

No immediate action

v0.7.8 New feature 2mo

Configurable proxy + ModelRouter

Open

llmkube-0.7.7 Maintenance 2mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.7 New feature 2mo

⚠ Upgrade required

OpenShift / OKD / MicroShift installs must use `helm ... -f charts/llmkube/values-openshift.yaml` to allow restricted-v2 SCC to inject fsGroup.
Operators with a custom `--init-container-image` whose user is not curl (uid=101 gid=102) should set `spec.podSecurityContext` on each InferenceService or pass `--default-fsgroup=` to the controller.

Notable features

OpenShift made a first-class deploy target (ci+chart changes)
VLLMConfig CRD now includes gpuMemoryUtilization and cpuOffloadGB fields
metal-agent emits Kubernetes events for memory-pressure, evictions, skips, and respawn blocks

Full changelog

0.7.7 (2026-05-11)

Features

agent: vllm-swift runtime + TurboQuant passthrough (#391) (#393) (2691e67)
ci+chart: make OpenShift a first-class deploy target (closes #421) (#422) (798a13e)
crd: add gpuMemoryUtilization and cpuOffloadGB to VLLMConfig (#394) (6883f78)
metal-agent: emit Kubernetes events for memory-pressure transitions, evictions, skips, and respawn blocks (closes #390) (#411) (e0d17d1)
observability: runtime label on inference pods + recording rules + starter dashboard (refs #409) (#410) (71743ed)

Bug Fixes

controller: default FSGroup to curl_group + Longhorn-backed e2e job (closes #418, closes #420) (adce90f)
controller: stop hot-spinning on unreachable file:// model sources (closes #405) (#412) (4ac6f57)

Documentation

add NVIDIA Blackwell B200 (sm_100) validation matrix (refs #413) (#414) (bfda149)
operations: seed runbooks index + first 2 entries (file:// hot-spin, metal-agent memory pressure) (#417) (d3bce8d)
port concepts/comparison to markdown (first Phase 1C content port) (#403) (51c396b)
readme: HN-launch readiness fixes (broken link, Apple Silicon CTA, quickstart memory) (#401) (3e44bfb)
refresh quickstart cast for v0.7.6 (HN launch) (#404) (5abaddb)
split docs/ into site/ and contributors/, prep for site rendering (#396) (9299a31)
upgrade: OpenShift / OKD / MicroShift installs must use helm ... -f charts/llmkube/values-openshift.yaml so restricted-v2 SCC can inject fsGroup from the namespace's allocated range (adce90f)
upgrade: operators using a custom --init-container-image whose user is not curl (uid=101 gid=102) should set spec.podSecurityContext on each InferenceService or pass --default-fsgroup=<gid> to the controller (adce90f)
upgrade: v0.7.7 rolls every InferenceService Pod once on first reconcile (Deployment template gains fsGroup=102 and the new inference.llmkube.dev/runtime label) (adce90f)

View release on GitHub

llmkube-0.7.6 Maintenance 2mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.6 Bug fix 2mo

Notable features

agent: added eviction safety floor, evictionProtection opt‑out, and late‑spawn condition fix
agent: introduced memory‑pressure eviction with respawn protection
api: passthrough podAnnotations and podLabels on InferenceServiceSpec

Full changelog

0.7.6 (2026-05-03)

Features

agent: eviction safety floor + evictionProtection opt-out + late-spawn condition fix (#186) (#384) (6544747)
agent: memory-pressure eviction and respawn protection (#186) (#382) (65a78b5)
api: add podAnnotations and podLabels passthrough (closes #326) (#381) (baecd68)
api: expose runtimeClassName on InferenceServiceSpec (closes #375) (#380) (cc44ff5)
crd: add ParallelSlots support for vllm and fix llamacpp (#340) (d81babb)

Bug Fixes

catalog: default phi-4-mini context to 8K (closes #386) (#387) (7bcd685)
controller: drop model label from Deployment selector to make modelRef mutable (closes #301) (#385) (a1de3bf)
derive metal InferenceService phase from Endpoints, not desiredReplicas (closes #374) (#376) (350dafe)

Documentation

fix broken phi-3-mini command and dead benchmark link (#369) (6a1fd58)
HN launch prep README polish (llama.cpp credit, vLLM, KubeAI/llm-d) (#371) (9d27774)

View release on GitHub

llmkube-0.7.5 Maintenance 2mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.5 Bug fix 2mo

Fixed Helm chart by syncing CRDs from kubebuilder source and adding a CI guard.

Full changelog

0.7.5 (2026-04-30)

Bug Fixes

chart: sync Helm CRDs from kubebuilder source and add CI guard (#367) (73bd2b4)

View release on GitHub

llmkube-0.7.4 Maintenance 2mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.4 Bug fix 2mo

Notable features

Controller pins vLLM default image to version 0.20.0

Full changelog

0.7.4 (2026-04-29)

Features

controller: pin vLLM default image to v0.20.0 (#362) (d2ae561)

Bug Fixes

controller: defer HTTP(S) Model downloads to the workload init container (#364) (469f542)

View release on GitHub

llmkube-0.7.3 Maintenance 2mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.3 Breaking risk 2mo

Notable features

Added cacheTypeCustomK/V for non-enum llama.cpp KV cache types
Added kvCacheCustomDtype for non-enum vLLM KV cache types

Full changelog

0.7.3 (2026-04-29)

Features

agent: cache-type-aware memory estimator + TurboQuant docs (#355) (0697afd)
api: add cacheTypeCustomK/V for non-enum llama.cpp KV cache types (#351) (71bd762)
api: add kvCacheCustomDtype for non-enum vLLM KV cache types (#359) (5e796d0)

Bug Fixes

agent: respawn on InferenceService spec drift, honor replicas=0, and plumb full spec to llama-server flags (#353) (ff54cad)
controller: use GGUF metadata name for downloaded model file basename (#347) (e932c7a)
vllm: set enableServiceLinks=false on vLLM Pod spec (#361) (01eb5c5)
vllm: use positional model argument instead of deprecated --model (#360) (a17566c)

View release on GitHub

llmkube-0.7.2 Maintenance 3mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.2 Bug fix 3mo

Notable features

Apple Silicon power gauges exposed via powermetrics
One-command make targets for installing and uninstalling powermetrics-sudo

Full changelog

0.7.2 (2026-04-27)

Features

agent: expose Apple Silicon power gauges via powermetrics (#334) (58a94a7)
make: one-command install-powermetrics-sudo + uninstall targets (#336) (af48077)

Bug Fixes

agent: make executor startup timeouts configurable; raise defaults to 120s (#330) (5aa5fa2)
agent: reconcile orphaned Service+Endpoints on agent startup (#332) (d88c541)

View release on GitHub

llmkube-0.7.1 Maintenance 3mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.1 Breaking risk 3mo

Notable features

Apple Silicon-optimized flags for llama-server in agent
values.schema.json added to Helm chart for value validation
agentic-coding flags extended in InferenceService vLLM config

Full changelog

0.7.1 (2026-04-25)

Features

agent: pass Apple Silicon-optimized flags to llama-server (#327) (a69ab6a)
chart: add values.schema.json for Helm value validation (#322) (1f8a34d)
crd: extend InferenceService vLLM config for agentic-coding flags (#306) (cb2aa6a)
security: supply-chain MVP — checksum install, govulncheck, gosec, codecov (#310) (f17f59d)

Bug Fixes

agent: detect stalled K8s polling and exit for supervisor restart (#328) (c0636cc)
agent: let the kernel pick free ports for llama-server (#321) (8111395)
bump InferenceService spec.contextSize cap from 131072 to 2097152 (#300) (a46a1bf)

Documentation

add ADOPTERS.md inviting public user listings (#324) (871a0cb)
backfill ⚠ BREAKING CHANGES section into 0.7.0 changelog (#296) (2ad4640)

View release on GitHub

llmkube-0.7.0 Maintenance 3mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.7.0 Breaking risk 3mo

Breaking changes

sharding.strategy: tensor now maps to llama.cpp --split-mode row instead of layer; set strategy: layer to retain previous behavior
InferenceService spec.extraArgs is forwarded to vLLM runtime, previously ignored; configs with llama.cpp-only flags will fail

Notable features

Hybrid GPU/CPU offloading support for MoE models
Tensor overrides and batch size controls for hybrid offloading
Additional runtime controls for llama.cpp and vllm

Full changelog

0.7.0 (2026-04-18)

⚠ BREAKING CHANGES

sharding: sharding.strategy: tensor on a Model now correctly maps to llama.cpp's --split-mode row instead of silently falling back to --split-mode layer. Configs that set strategy: tensor expecting layer behavior may see performance regressions or new failure modes under concurrent load (particularly on consumer PCIe multi-GPU setups with quantized models). Explicitly set strategy: layer to retain the previous behavior. (#291)
vllm: InferenceService spec.extraArgs is now forwarded to the vLLM runtime. Previously extraArgs was silently ignored when runtime: vllm. Configs that placed llama.cpp-only flags in extraArgs on a vLLM InferenceService will start failing at pod startup. Audit any vLLM InferenceService that sets extraArgs before upgrading. (#291)

Features

add hybrid GPU/CPU offloading support for MoE models (#281) (2287f66)
add tensor overrides and batch size controls for hybrid offloading (#283) (8be4adc)
expose additional runtime controls for llama.cpp and vllm (#291) (2245718)
recognize runtime-resolved sources (HF repo IDs) in Model controller (#293) (953e8a7)

Bug Fixes

inherit runAsUser/runAsGroup from podSecurityContext (#274) (72b9b5c)

Documentation

surface breaking behavior changes for 0.7.0 (#294) (e234a40)

View release on GitHub

llmkube-0.6.0 Maintenance 3mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.6.0 Breaking risk 3mo

Breaking changes

Default CUDA image changed from prior version to server-cuda13 for Qwen3.5 and Blackwell support.

Notable features

First-class PersonaPlex (Moshi) runtime backend added
Grafana inference metrics dashboard added
HPA autoscaling for InferenceService added

Full changelog

0.6.0 (2026-04-08)

⚠ BREAKING CHANGES

update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support (#262)

Features

add first-class PersonaPlex (Moshi) runtime backend (#272) (2b1c948)
add Grafana inference metrics dashboard (#269) (be376c6)
add HPA autoscaling for InferenceService (#260) (2d16502)
add pluggable runtime backends for non-llama.cpp inference engines (#271) (bb1576c)
add vLLM and TGI runtime backends with per-runtime HPA metrics (#273) (441c7c7)
separate image registry from repository in Helm chart (#268) (5c059a4)
support custom layer splits from GPUShardingSpec (#267) (a37701c)
update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support (#262) (cc9a95e)

View release on GitHub

llmkube-0.5.3 Maintenance 3mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.5.3 New feature 3mo

Notable features

KV cache type configuration and extraArgs escape hatch
Ollama runtime backend for Metal agent
oMLX alternative runtime backend for Metal agent

Full changelog

0.5.3 (2026-04-01)

Features

add KV cache type configuration and extraArgs escape hatch (#256) (7a4b855)
add Ollama as runtime backend for Metal agent (#258) (6148b89)
add oMLX as alternative runtime backend for Metal agent (#257) (eaf9045)

Bug Fixes

improve Metal agent usability (#254) (149c582)

View release on GitHub

llmkube-0.5.2 Maintenance 4mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.5.2 New feature 4mo

Notable features

Add pod security context defaults and CRD overrides

Full changelog

0.5.2 (2026-03-27)

Features

add pod security context defaults and CRD overrides (#239) (904432b)

Documentation

add CNCF/Kubernetes trademark disclaimer (#246) (27a49eb)
add Discord community link (#236) (c0d499d)
add OpenShift troubleshooting to README (#241) (47fd1b0)

View release on GitHub

llmkube-0.5.1 Maintenance 4mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.5.1 New feature 4mo

Notable features

Memory pressure watchdog with runtime monitoring
PVC:// model source and SHA256 integrity verification
Auto-detect llama-server from Homebrew paths on macOS

Full changelog

0.5.1 (2026-03-16)

Features

add memory pressure watchdog with runtime monitoring (#216) (5fa6d54)
add pvc:// model source and SHA256 integrity verification (#229) (1b94f5d)
auto-detect llama-server from Homebrew paths on macOS (#215) (a1e4302)

Bug Fixes

controller metrics port declarations and ServiceMonitor consistency (#214) (296ec99)
correct CHANGELOG entry from 0.4.21 to 0.5.0 (#212) (f7f703a)
quote job-level if expression to fix YAML parsing in helm-chart workflow (8714b9f)

View release on GitHub

llmkube-0.5.0 Bugfix 4mo

Fixed Helm chart appVersion mismatch with the published controller image.

Changelog

Helm chart for LLMKube v0.5.0 — fixes appVersion to match published controller image

View release on GitHub

v0.5.0 New feature 4mo

Notable features

Added per-model `memoryBudget` and `memoryFraction` CRD fields.
Added pre‑flight memory validation for Metal agent.
Added health checks, metrics, and continuous monitoring to Metal agent.

Full changelog

0.5.0 (2026-03-04)

Features

add pre-flight memory validation for Metal agent (#204) (ba252ef)
add health checks, metrics, and continuous monitoring to Metal agent (#205) (a113fd1)
add per-model memoryBudget and memoryFraction CRD fields (#206) (e632369)

Bug Fixes

agent: unregister service endpoints on metal process delete (#168) (147b9bc)
enable controller metrics endpoint in Helm chart (#195) (70940af)
prevent model re-download of cached models after helm upgrade (#203) (a8f9a88)
use Recreate strategy for GPU workloads to prevent rolling update deadlock (#196) (2e45181)

Documentation

rewrite README for clarity, positioning, and growth (#190) (a7fc152)

View release on GitHub

llmkube-0.4.20 Maintenance 4mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.20 Security relevant 4mo

Security fixes

Prevent command injection in init container shell commands — mitigates remote code execution vulnerability

Notable features

License compliance scanning for GGUF models
Prometheus metrics, OpenTelemetry tracing, and inference observability added
PVC inspection in cache list to detect orphaned entries

Full changelog

0.4.20 (2026-02-28)

Features

add license compliance scanning for GGUF models (#188) (c26400a)
add Prometheus metrics, OpenTelemetry tracing, and inference observability (#189) (c653ff1)
add PVC inspection to cache list for orphaned entry detection (#183) (2723d92)
agent: add structured zap logging to metal agent (#164) (e9d143c)
deps: upgrade to Kubernetes 1.35 and controller-runtime v0.23.1 (#175) (3c323f4)

Bug Fixes

correct Metal quickstart docs for selectorless services (#173) (89471ec)
prevent command injection in init container shell commands (#172) (3aa9cc3)
remove mutable latest tags and pin container images (#174) (3c4569a)

Documentation

add Apple Silicon Metal option to bug report template (#169) (e7689d8)

View release on GitHub

llmkube-0.4.19 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.19 New feature 5mo

Notable features

Added --jinja flag to enable Jinja templating for tool and function calls

Full changelog

0.4.19 (2026-02-21)

Features

add --jinja flag for tool/function calling support (#162) (47624ca)

View release on GitHub

llmkube-0.4.18 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.18 Bug fix 5mo

Fixed reading contextSize from the InferenceService CRD in the agent.

Full changelog

0.4.18 (2026-02-20)

Bug Fixes

agent: read contextSize from InferenceService CRD (#160) (17f58d4)

Documentation

update README and Metal Agent guide for remote K8s architecture (#156) (79145b2)

View release on GitHub

llmkube-0.4.17 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.17 Bug fix 5mo

Fixed agent filtering of InferenceServices to match the correct Metal accelerator type.

Full changelog

0.4.17 (2026-02-20)

Bug Fixes

agent: filter InferenceServices by Metal accelerator type (#157) (5737bb7)

View release on GitHub

llmkube-0.4.16 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.16 New feature 5mo

Notable features

Added --host-ip flag to agent for remote Kubernetes cluster support

Full changelog

0.4.16 (2026-02-20)

Features

agent: add --host-ip flag for remote K8s cluster support (#155) (b425569)

Documentation

Add Metal Agent (Apple Silicon) support to README (#151) (3579426)

View release on GitHub

llmkube-0.4.15 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.15 Bug fix 5mo

Fixed inference flag passing for newer llama.cpp versions.

Full changelog

0.4.15 (2026-02-15)

Bug Fixes

inference: pass value to --flash-attn for newer llama.cpp versions (#148) (25e08d0)

View release on GitHub

llmkube-0.4.14 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.14 New feature 5mo

Notable features

Native Go GGUF parser with CRD integration and CLI inspect command
FlashAttention support added to inference manifest
ContextSize parameter introduced in sample manifest

Full changelog

0.4.14 (2026-02-15)

Features

gguf: add native Go GGUF parser with CRD integration and CLI inspect (#140) (9d96ed4)
inference: add flashAttention and contextSize to sample manifest (914c929), closes #145

View release on GitHub

llmkube-0.4.13 Maintenance 5mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.13 New feature 5mo

Notable features

Controller init container image is now configurable.
InferenceService CRD exposes llama.cpp parallel slots setting.
Helm chart adds optional NetworkPolicy for controller manager.

Full changelog

0.4.13 (2026-02-07)

Features

controller: make init container image configurable (#128) (38ccdf0)
expose llama.cpp parallel slots in InferenceService CRD (#133) (cae7b52)
helm: add optional NetworkPolicy for controller manager (#135) (8d61ce3)
update model catalog with DeepSeek R1 and refresh stale entries (#131) (89eb5a6)

View release on GitHub

llmkube-0.4.12 Maintenance 6mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.12 Breaking risk 6mo

Notable features

Support for custom Certificate Authorities (CA)
Fixed deprecated image tags

Full changelog

0.4.12 (2026-01-22)

Features

add custom CA support and fix deprecated image tags (#124) (5ec912e)

View release on GitHub

llmkube-0.4.11 Maintenance 6mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.11 Bug fix 6mo

Fixed file uploads above 100MB silently dropping.

Full changelog

0.4.11 (2026-01-22)

Bug Fixes

cli: use numeric comparison for version checking (#109) (05e0025)
controller: use fully qualified image names for curl (#121) (213660b)

View release on GitHub

v0.4.10 New feature 7mo

Notable features

Air‑gapped deployment support for environments without internet access
32B model catalog with --context flag support
GPU observability configuration and Grafana dashboard

Full changelog

What's New in v0.4.10

Features

Air-gapped deployment support - Deploy models from local file paths for environments without internet access (#85)
32B models in catalog - Added larger models with --context flag support (#88)
GPU observability - New configuration and Grafana dashboard for GPU metrics (#105)
Benchmark test suites - Comprehensive benchmark sweeps for performance testing (#107)
Stress testing mode - New stress testing capabilities in the benchmark command (#104)

Documentation

Added community standards and security policy (#92)
Updated documentation for v0.4.9 GPU scheduling features (#83)

Installation

Homebrew (Recommended for macOS)

brew install defilantech/tap/llmkube

Install Script (Linux/macOS)

curl -sSL https://raw.githubusercontent.com/defilantech/LLMKube/main/install.sh | bash

Manual Download

macOS

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows: Download the .zip file and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Full Changelog: https://github.com/defilantech/LLMKube/compare/v0.4.9...v0.4.10

View release on GitHub

llmkube-0.4.10 Maintenance 7mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

llmkube-0.4.9 Maintenance 7mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.9 New feature 7mo

Notable features

GPU contention visibility with queue position and priority classes

Full changelog

0.4.9 (2025-12-01)

Features

add GPU contention visibility, queue position, and priority classes (#81) (c0220e5)

Documentation

add getting started video to README (#76) (ceb83d7)

View release on GitHub

llmkube-0.4.8 Maintenance 8mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.8 New feature 8mo

Notable features

Support configurable context size for llama.cpp server

Full changelog

0.4.8 (2025-11-27)

Features

Support configurable context size for llama.cpp server (#73) (6f8e04b)

View release on GitHub

llmkube-0.4.7 Maintenance 8mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.7 Bug fix 8mo

Fixed bug where Helm chart releases were incorrectly marked as the latest.

Full changelog

0.4.7 (2025-11-26)

Bug Fixes

Don't mark Helm chart release as latest (#70) (761b154)

View release on GitHub

llmkube-0.4.6 Maintenance 8mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.6 Bug fix 8mo

Fixed empty component causing "llmkube-" prefix in release identifiers.

Full changelog

0.4.6 (2025-11-26)

Bug Fixes

Set empty component to prevent llmkube- prefix in releases (#68) (45b61c6)

View release on GitHub

llmkube-0.4.5 Maintenance 8mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.5 Maintenance 8mo

Minor fixes and improvements.

Full changelog

0.4.5 (2025-11-26)

Bug Fixes

Clean up release process - single release with proper notes (#66) (4deae85)

View release on GitHub

llmkube-0.4.4 Maintenance 8mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.4.4 Bug fix 8mo

Fixed CI workflow to trigger GoReleaser and Helm release.

Full changelog

LLMKube v0.4.4

Release Date: 2025-11-26T19:10:25Z

See RELEASE_NOTES_v0.4.4.md for complete details.

Changelog

Bug Fixes

9a37a77e556d6f811cb6a090125a4a73e2e9c346: fix: Trigger GoReleaser and Helm release from Release Please workflow (#64) (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.4.4.md

View release on GitHub

v0.4.3 New feature 8mo

Notable features

Metal GPU support for macOS (Apple Silicon)
Model catalog with 10 pre‑configured models
Add benchmark command and reorganize documentation

Full changelog

0.4.3 (2025-11-26)

Features

Add benchmark command and reorganize documentation (58307be)
Add benchmark command and reorganize documentation (ac8888e), closes #6
Add Helm chart for easy installation (5718804)
Add Helm chart for easy installation with comprehensive CI testing (3ea3bfd), closes #9
Add Metal GPU support for macOS (Apple Silicon) (f673c26), closes #33
Add model catalog with 10 pre-configured models (404d722)
Add model catalog with 10 pre-configured models (Phase 1) (0fd969a)
Add persistent model cache to avoid re-downloading (83f844f), closes #52
Add Release Please automation and version-agnostic docs (dc2d54e)
helm: Add image digest support for production deployments (a38801d)
Implement automatic port forwarding for benchmark command (472b3ae)
Multi-GPU support with layer-based sharding (#47) (4797609)
Persistent model cache with per-namespace PVC support (ab04261)
Set up Helm repository on GitHub Pages (8d62737)
Support per-namespace model cache PVCs (c3cb891)

Bug Fixes

Add cacheKey to CRD and restrict cache to llmkube-system namespace (464c23d)
Add CRD keep policy and improve security test reliability (ff32296)
Add Helm chart publishing to release workflow (8baf9c4)
Add Helm chart publishing to release workflow (03bab72)
Add Homebrew archive IDs and v0.3.0 release notes (cea933b)
Address lint issues in benchmark command (bf80610)
Address linter errors in catalog implementation (8932e4f)
Address linter issues in Metal agent code (3f1f678)
controller: Add Model watch to InferenceService controller (cb4e201)
Correct CLI binary path in E2E tests (41af555)
Fix GoReleaser Homebrew tap configuration for v0.3.0 (4e95c04)
Further increase Helm CI timeout and readiness probe delay (5453d66)
Further increase Helm CI timeout and readiness probe delay (fd577d3)
Handle resp.Body.Close error in version check (linter) (fb3adf5)
Increase Helm chart CI timeout from 2m to 5m (7a08b45)
Increase Helm chart CI timeout from 2m to 5m (ced2210)
InferenceService stuck in Pending when Model becomes Ready (4d20aec)
Metal agent production fixes and testing improvements (8744c7b)
Resolve Helm chart CI test failures (9919696)
Resolve staticcheck SA5011 lint errors and update CONTRIBUTING.md (#60) (c0b5824)
Sanitize Service names for DNS-1035 compliance (v0.3.3) (db81990)
Sanitize Service names to comply with DNS-1035 requirements (b431986)
Skip containerized Deployment for Metal accelerator and add version check (d300e64)
Skip containerized Deployment for Metal accelerator and add version check (8dab955)
Suppress Endpoints API deprecation warnings (e70a4b3)
Update operator deployment to use correct container image (00fee75)
Update operator deployment to use correct container image (4c67a78)
Update version.go to 0.2.1 and add automation for future releases (8dd613d)
Update version.go to 0.2.1 and add automation for future releases (2ff68bd)
Use simple v* tag format for releases (#62) (bda9f19)
Use workspace path for kubeconform validation (fc066d8)

Documentation

Add CLI option to quick start, keep kubectl as fallback (f6829ee)
Add release notes for v0.3.2 (177abf8)
Add release notes for v0.3.2 (ca1bb12)
Add release notes for v0.4.0 (144b960)
Add release notes for v0.4.0 (a61321f)
Overhaul README and roadmap for public launch (b42c17e)
Update binary download links to version 0.2.1 (fad530a)
Update binary download links to version 0.2.1 (63bb0fa)
Update Helm installation to use GitHub Pages repository (477e037)
Update MODEL-CACHE.md for per-namespace PVC pattern (0be3f46)

View release on GitHub

llmkubev0.4.2 Maintenance 8mo

Minor fixes and improvements.

Full changelog

0.4.2 (2025-11-26)

Bug Fixes

Resolve staticcheck SA5011 lint errors and update CONTRIBUTING.md (#60) (c0b5824)

View release on GitHub

llmkubev0.4.1 New feature 8mo

Notable features

Add benchmark command and reorganize documentation
Add persistent model cache to avoid re‑downloading with per‑namespace PVC support
**helm:** Add image digest support for production deployments

Full changelog

0.4.1 (2025-11-26)

Features

Add benchmark command and reorganize documentation (58307be)
Add benchmark command and reorganize documentation (ac8888e), closes #6
Add persistent model cache to avoid re-downloading (83f844f), closes #52
Add Release Please automation and version-agnostic docs (dc2d54e)
helm: Add image digest support for production deployments (a38801d)
Implement automatic port forwarding for benchmark command (472b3ae)
Persistent model cache with per-namespace PVC support (ab04261)
Support per-namespace model cache PVCs (c3cb891)

Bug Fixes

Add cacheKey to CRD and restrict cache to llmkube-system namespace (464c23d)
Address lint issues in benchmark command (bf80610)

Documentation

Update MODEL-CACHE.md for per-namespace PVC pattern (0be3f46)

View release on GitHub

v0.4.0 New feature 8mo

Notable features

Multi‑GPU support with layer‑based sharding

Full changelog

LLMKube v0.4.0

Release Date: 2025-11-26T00:23:11Z

See RELEASE_NOTES_v0.4.0.md for complete details.

Changelog

New Features

479760973eb811a0b7a71c711f52ca3d8695b761: feat: Multi-GPU support with layer-based sharding (#47) (@Defilan)

Bug Fixes

03bab72a74496085b79e3c51838f9853ed674062: fix: Add Helm chart publishing to release workflow (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.4.0.md

View release on GitHub

llmkube-0.4.0 Maintenance 8mo

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

View release on GitHub

v0.3.3 Bug fix 8mo

Fixed Service names to comply with DNS-1035 requirements.

Full changelog

LLMKube v0.3.3

Release Date: 2025-11-24T17:07:23Z

See RELEASE_NOTES_v0.3.3.md for complete details.

Changelog

Bug Fixes

b431986ceae6b383ee064bec595c922a42394a8e: fix: Sanitize Service names to comply with DNS-1035 requirements (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.3.3.md

View release on GitHub

v0.3.2 Bug fix 8mo

Fixed resp.Body.Close error handling in version check and skipped containerized Deployment for Metal accelerator.

Full changelog

LLMKube v0.3.2

Release Date: 2025-11-24T16:28:19Z

See RELEASE_NOTES_v0.3.2.md for complete details.

Changelog

Bug Fixes

fb3adf57913744e08ebffb58af6877bd15fbeb93: fix: Handle resp.Body.Close error in version check (linter) (@Defilan)
8dab955a2d1e728fe8a9b1b2971a4906454d71c3: fix: Skip containerized Deployment for Metal accelerator and add version check (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.3.2.md

View release on GitHub

v0.3.1 Bug fix 8mo

Fixed controller OOM by increasing memory limits.

Full changelog

LLMKube v0.3.1

Release Date: 2025-11-24T09:17:13Z

See RELEASE_NOTES_v0.3.1.md for complete details.

Changelog

Bug Fixes

fd577d3137da086346524f1802e47219feefa1fa: fix: Further increase Helm CI timeout and readiness probe delay (@Defilan)
ced2210ea28d453fdac4c7346bc98f66684893b1: fix: Increase Helm chart CI timeout from 2m to 5m (@Defilan)
4c67a7806232c687b7b2450660735d9265d507b8: fix: Update operator deployment to use correct container image (@Defilan)

Other Changes

3e60a3031ef0f443209c0088e84f1a01dd1f6c1a: Release v0.3.1: Fix controller OOM with increased memory limits (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.3.1.md

View release on GitHub

v0.3.0 New feature 8mo

Notable features

Add Metal GPU support for macOS (Apple Silicon)

Full changelog

LLMKube v0.3.0

Release Date: 2025-11-24T06:15:31Z

See RELEASE_NOTES_v0.3.0.md for complete details.

Changelog

New Features

f673c26bd4ac1a285dc7e72ffe6a930bc586b855: feat: Add Metal GPU support for macOS (Apple Silicon) (@Defilan)

Bug Fixes

cea933beac2607122772d14184b35da04738b7f9: fix: Add Homebrew archive IDs and v0.3.0 release notes (@Defilan)
3f1f678502c985b04d48a1c8c8bc44ea68d8a542: fix: Address linter issues in Metal agent code (@Defilan)
8744c7b54e23cbb77609a97340d9be9dd5da931c: fix: Metal agent production fixes and testing improvements (@Defilan)
e70a4b391725a70a82d78d47a7d4f6d2b898dcc8: fix: Suppress Endpoints API deprecation warnings (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.3.0.md

View release on GitHub

v0.2.2 New feature 8mo

Notable features

Add Helm chart for easy installation with comprehensive CI testing
Add model catalog featuring ten pre‑configured models

Full changelog

LLMKube v0.2.2

Release Date: 2025-11-24T02:00:38Z

See RELEASE_NOTES_v0.2.2.md for complete details.

Changelog

New Features

3ea3bfd27ce864f7884f25ae9db65ed52eb68e01: feat: Add Helm chart for easy installation with comprehensive CI testing (@Defilan)
404d722e70d3e885f1e437ebdadf38fe43c7689a: feat: Add model catalog with 10 pre-configured models (@Defilan)

Bug Fixes

ff32296a45174bdce6070844a68007e2c45cf3fe: fix: Add CRD keep policy and improve security test reliability (@Defilan)
8932e4fbb3fe8d1fea1fedba5bb18f3cd02808c8: fix: Address linter errors in catalog implementation (@Defilan)
41af55589ba6b17f07119b50d96db9c39eac6ea3: fix: Correct CLI binary path in E2E tests (@Defilan)
99196961bf91e4c285182211a7a6fdec574ae7e7: fix: Resolve Helm chart CI test failures (@Defilan)
2ff68bdc0e40ab9ee8337403af649fda7354ad7c: fix: Update version.go to 0.2.1 and add automation for future releases (@Defilan)
fc066d8d0f9175382fa7cfab5f40c755739e175f: fix: Use workspace path for kubeconform validation (@Defilan)

Other Changes

aa84b601d75753c585cacace76311fbbac598080: Add Minikube quickstart guide and improve CLI-first documentation (@Defilan)
5f08b27232102d17a0e2ae59f74176ed25a9689b: Update docs to recommend local controller for Minikube/local development (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.2.2.md

View release on GitHub

v0.2.1 Bug fix 8mo

Fixed Model watch missing in InferenceService controller.

Full changelog

LLMKube v0.2.1

Release Date: 2025-11-18T16:21:32Z

See RELEASE_NOTES_v0.2.1.md for complete details.

Changelog

Other Changes

cb4e2019583a811fa98af1a446bd0df6b6c3cba2: fix(controller): Add Model watch to InferenceService controller (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.2.1.md

View release on GitHub

v0.2.0 Maintenance 8mo

Minor fixes and improvements.

Full changelog

LLMKube v0.2.0

Release Date: 2025-11-18T06:34:01Z

See RELEASE_NOTES_v0.2.0.md for complete details.

Changelog

Other Changes

f821f0f073040d82613e8ed809ab2d402f1fb2a7: Initial public release: LLMKube v0.2.0 (Christopher Maher [email protected])

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

Full Release Notes: RELEASE_NOTES_v0.2.0.md

View release on GitHub

All releases

0.8.1 (2026-06-01)

⚠ BREAKING CHANGES

Features

Bug Fixes

Documentation

Miscellaneous

0.7.7 (2026-05-11)

Features

Bug Fixes

Documentation

0.7.6 (2026-05-03)

Features

Bug Fixes

Documentation

0.7.5 (2026-04-30)

Bug Fixes

0.7.4 (2026-04-29)

Features

Bug Fixes

0.7.3 (2026-04-29)

Features

Bug Fixes

0.7.2 (2026-04-27)

Features

Bug Fixes

0.7.1 (2026-04-25)

Features

Bug Fixes

Documentation

0.7.0 (2026-04-18)

⚠ BREAKING CHANGES

Features

Bug Fixes

Documentation

0.6.0 (2026-04-08)

⚠ BREAKING CHANGES

Features

0.5.3 (2026-04-01)

Features

Bug Fixes

0.5.2 (2026-03-27)

Features

Documentation

0.5.1 (2026-03-16)

Features

Bug Fixes

0.5.0 (2026-03-04)

Features

Bug Fixes

Documentation

0.4.20 (2026-02-28)

Features

Bug Fixes

Documentation

0.4.19 (2026-02-21)

Features

0.4.18 (2026-02-20)

Bug Fixes

Documentation

0.4.17 (2026-02-20)

Bug Fixes

0.4.16 (2026-02-20)

Features

Documentation

0.4.15 (2026-02-15)

Bug Fixes

0.4.14 (2026-02-15)

Features

0.4.13 (2026-02-07)

Features

0.4.12 (2026-01-22)

Features

0.4.11 (2026-01-22)

Bug Fixes

What's New in v0.4.10

Features

Documentation

Installation

Homebrew (Recommended for macOS)