Skip to content

Release history

LLMKube releases

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases

89 shown

No immediate action
llmkube-0.8.1 Maintenance

Routine maintenance and dependency updates.

No immediate action
foreman-0.8.1 Feature

Foreman workload scheduler

v0.8.1 Breaking risk
⚠ Upgrade required
  • After upgrading to v0.8.1, re‑apply all Agent CRs so existing Agents pick up explicit values for the new requestTimeoutSeconds and requestTurnTimeoutSeconds fields.
Breaking changes
  • Agent.spec.requestTimeoutSeconds now represents a loop-wide wall-clock budget (default 3600) instead of per-request HTTP timeout; the former behavior is moved to Agent.spec.requestTurnTimeoutSeconds (default 120). Re‑apply Agent CRs after upgrade.
Notable features
  • **inferenceservice:** adds typed spec.ropeScaling for RoPE/YaRN context extension
Full changelog

0.8.1 (2026-06-01)

⚠ BREAKING CHANGES

  • foreman: Agent.spec.requestTimeoutSeconds changes meaning from a per-request HTTP timeout to a loop-wide wall-clock budget, and its default moves from 600 to 3600. The former per-request bound is now the new Agent.spec.requestTurnTimeoutSeconds (default 120). Re-apply your Agent CRs after upgrade so existing Agents pick up explicit values.

Features

  • inferenceservice: typed spec.ropeScaling for RoPE/YaRN context extension (#507) (#600) (a554aee)

Bug Fixes

  • foreman: recover orphaned phase=Running tasks on agent restart (#542) (#598) (6dd2c44)
  • foreman: split per-turn timeout from loop-wide budget (#532) (#602) (41e7663)
  • foreman: warm-path reviewer scheduling on macOS (#578, #579) (#597) (a94d1ef)
  • metal-agent: prefer routable interface for host-IP auto-detect (#526) (#599) (c780795)

Documentation

  • foreman: absolute paths in overview README cross-refs (fix llmkube-web prerender) (#596) (b5f6f94)
  • foreman: move docs/foreman to docs/site/foreman + register in site nav (#594) (9fd85bb)

Miscellaneous

  • pin next release to 0.8.1 (Release-As) (#605) (a876cc6)
No immediate action
llmkube-0.8.0 Maintenance

Routine maintenance and dependency updates.

No immediate action
foreman-0.8.0 Feature

Agentic workload scheduler

No immediate action
v0.8.0 New feature

Taxonomy, masking, Intel GPU

No immediate action
llmkube-0.7.12 Maintenance

Routine maintenance and dependency updates.

No immediate action
foreman-0.7.12 Maintenance

Manual Foreman install

No immediate action
v0.7.12 Mixed

Workload reconciler + job fixes + doc improvements

No immediate action
llmkube-0.7.11 Maintenance

Routine maintenance and dependency updates.

No immediate action
foreman-0.7.11 Feature

Foreman opt-in add-on

No immediate action
v0.7.11 Bug fix

Foreman chart fix

No immediate action
v0.7.10 New feature

--llama-server-port + lint-all + scheduler

No immediate action
llmkube-0.7.9 Maintenance

Routine maintenance and dependency updates.

No immediate action
v0.7.9 New feature

mlx-server runtime + scale subresource

No immediate action
llmkube-0.7.8 Maintenance

Routine maintenance and dependency updates.

No immediate action
v0.7.8 New feature

Configurable proxy + ModelRouter

llmkube-0.7.7 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.7 New feature
⚠ Upgrade required
  • OpenShift / OKD / MicroShift installs must use `helm ... -f charts/llmkube/values-openshift.yaml` to allow restricted-v2 SCC to inject fsGroup.
  • Operators with a custom `--init-container-image` whose user is not curl (uid=101 gid=102) should set `spec.podSecurityContext` on each InferenceService or pass `--default-fsgroup=` to the controller.
Notable features
  • OpenShift made a first-class deploy target (ci+chart changes)
  • VLLMConfig CRD now includes gpuMemoryUtilization and cpuOffloadGB fields
  • metal-agent emits Kubernetes events for memory-pressure, evictions, skips, and respawn blocks
Full changelog

0.7.7 (2026-05-11)

Features

  • agent: vllm-swift runtime + TurboQuant passthrough (#391) (#393) (2691e67)
  • ci+chart: make OpenShift a first-class deploy target (closes #421) (#422) (798a13e)
  • crd: add gpuMemoryUtilization and cpuOffloadGB to VLLMConfig (#394) (6883f78)
  • metal-agent: emit Kubernetes events for memory-pressure transitions, evictions, skips, and respawn blocks (closes #390) (#411) (e0d17d1)
  • observability: runtime label on inference pods + recording rules + starter dashboard (refs #409) (#410) (71743ed)

Bug Fixes

  • controller: default FSGroup to curl_group + Longhorn-backed e2e job (closes #418, closes #420) (adce90f)
  • controller: stop hot-spinning on unreachable file:// model sources (closes #405) (#412) (4ac6f57)

Documentation

  • add NVIDIA Blackwell B200 (sm_100) validation matrix (refs #413) (#414) (bfda149)
  • operations: seed runbooks index + first 2 entries (file:// hot-spin, metal-agent memory pressure) (#417) (d3bce8d)
  • port concepts/comparison to markdown (first Phase 1C content port) (#403) (51c396b)
  • readme: HN-launch readiness fixes (broken link, Apple Silicon CTA, quickstart memory) (#401) (3e44bfb)
  • refresh quickstart cast for v0.7.6 (HN launch) (#404) (5abaddb)
  • split docs/ into site/ and contributors/, prep for site rendering (#396) (9299a31)
  • upgrade: OpenShift / OKD / MicroShift installs must use helm ... -f charts/llmkube/values-openshift.yaml so restricted-v2 SCC can inject fsGroup from the namespace's allocated range (adce90f)
  • upgrade: operators using a custom --init-container-image whose user is not curl (uid=101 gid=102) should set spec.podSecurityContext on each InferenceService or pass --default-fsgroup=<gid> to the controller (adce90f)
  • upgrade: v0.7.7 rolls every InferenceService Pod once on first reconcile (Deployment template gains fsGroup=102 and the new inference.llmkube.dev/runtime label) (adce90f)
llmkube-0.7.6 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.6 Bug fix
Notable features
  • agent: added eviction safety floor, evictionProtection opt‑out, and late‑spawn condition fix
  • agent: introduced memory‑pressure eviction with respawn protection
  • api: passthrough podAnnotations and podLabels on InferenceServiceSpec
Full changelog

0.7.6 (2026-05-03)

Features

  • agent: eviction safety floor + evictionProtection opt-out + late-spawn condition fix (#186) (#384) (6544747)
  • agent: memory-pressure eviction and respawn protection (#186) (#382) (65a78b5)
  • api: add podAnnotations and podLabels passthrough (closes #326) (#381) (baecd68)
  • api: expose runtimeClassName on InferenceServiceSpec (closes #375) (#380) (cc44ff5)
  • crd: add ParallelSlots support for vllm and fix llamacpp (#340) (d81babb)

Bug Fixes

  • catalog: default phi-4-mini context to 8K (closes #386) (#387) (7bcd685)
  • controller: drop model label from Deployment selector to make modelRef mutable (closes #301) (#385) (a1de3bf)
  • derive metal InferenceService phase from Endpoints, not desiredReplicas (closes #374) (#376) (350dafe)

Documentation

  • fix broken phi-3-mini command and dead benchmark link (#369) (6a1fd58)
  • HN launch prep README polish (llama.cpp credit, vLLM, KubeAI/llm-d) (#371) (9d27774)
llmkube-0.7.5 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.5 Bug fix

Fixed Helm chart by syncing CRDs from kubebuilder source and adding a CI guard.

Full changelog

0.7.5 (2026-04-30)

Bug Fixes

  • chart: sync Helm CRDs from kubebuilder source and add CI guard (#367) (73bd2b4)
llmkube-0.7.4 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.4 Bug fix
Notable features
  • Controller pins vLLM default image to version 0.20.0
Full changelog

0.7.4 (2026-04-29)

Features

  • controller: pin vLLM default image to v0.20.0 (#362) (d2ae561)

Bug Fixes

  • controller: defer HTTP(S) Model downloads to the workload init container (#364) (469f542)
llmkube-0.7.3 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.3 Breaking risk
Notable features
  • Added cacheTypeCustomK/V for non-enum llama.cpp KV cache types
  • Added kvCacheCustomDtype for non-enum vLLM KV cache types
Full changelog

0.7.3 (2026-04-29)

Features

  • agent: cache-type-aware memory estimator + TurboQuant docs (#355) (0697afd)
  • api: add cacheTypeCustomK/V for non-enum llama.cpp KV cache types (#351) (71bd762)
  • api: add kvCacheCustomDtype for non-enum vLLM KV cache types (#359) (5e796d0)

Bug Fixes

  • agent: respawn on InferenceService spec drift, honor replicas=0, and plumb full spec to llama-server flags (#353) (ff54cad)
  • controller: use GGUF metadata name for downloaded model file basename (#347) (e932c7a)
  • vllm: set enableServiceLinks=false on vLLM Pod spec (#361) (01eb5c5)
  • vllm: use positional model argument instead of deprecated --model (#360) (a17566c)
llmkube-0.7.2 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.2 Bug fix
Notable features
  • Apple Silicon power gauges exposed via powermetrics
  • One-command make targets for installing and uninstalling powermetrics-sudo
Full changelog

0.7.2 (2026-04-27)

Features

  • agent: expose Apple Silicon power gauges via powermetrics (#334) (58a94a7)
  • make: one-command install-powermetrics-sudo + uninstall targets (#336) (af48077)

Bug Fixes

  • agent: make executor startup timeouts configurable; raise defaults to 120s (#330) (5aa5fa2)
  • agent: reconcile orphaned Service+Endpoints on agent startup (#332) (d88c541)
llmkube-0.7.1 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.1 Breaking risk
Notable features
  • Apple Silicon-optimized flags for llama-server in agent
  • values.schema.json added to Helm chart for value validation
  • agentic-coding flags extended in InferenceService vLLM config
Full changelog

0.7.1 (2026-04-25)

Features

  • agent: pass Apple Silicon-optimized flags to llama-server (#327) (a69ab6a)
  • chart: add values.schema.json for Helm value validation (#322) (1f8a34d)
  • crd: extend InferenceService vLLM config for agentic-coding flags (#306) (cb2aa6a)
  • security: supply-chain MVP — checksum install, govulncheck, gosec, codecov (#310) (f17f59d)

Bug Fixes

  • agent: detect stalled K8s polling and exit for supervisor restart (#328) (c0636cc)
  • agent: let the kernel pick free ports for llama-server (#321) (8111395)
  • bump InferenceService spec.contextSize cap from 131072 to 2097152 (#300) (a46a1bf)

Documentation

  • add ADOPTERS.md inviting public user listings (#324) (871a0cb)
  • backfill ⚠ BREAKING CHANGES section into 0.7.0 changelog (#296) (2ad4640)
llmkube-0.7.0 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.0 Breaking risk
Breaking changes
  • sharding.strategy: tensor now maps to llama.cpp --split-mode row instead of layer; set strategy: layer to retain previous behavior
  • InferenceService spec.extraArgs is forwarded to vLLM runtime, previously ignored; configs with llama.cpp-only flags will fail
Notable features
  • Hybrid GPU/CPU offloading support for MoE models
  • Tensor overrides and batch size controls for hybrid offloading
  • Additional runtime controls for llama.cpp and vllm
Full changelog

0.7.0 (2026-04-18)

⚠ BREAKING CHANGES

  • sharding: sharding.strategy: tensor on a Model now correctly maps to llama.cpp's --split-mode row instead of silently falling back to --split-mode layer. Configs that set strategy: tensor expecting layer behavior may see performance regressions or new failure modes under concurrent load (particularly on consumer PCIe multi-GPU setups with quantized models). Explicitly set strategy: layer to retain the previous behavior. (#291)
  • vllm: InferenceService spec.extraArgs is now forwarded to the vLLM runtime. Previously extraArgs was silently ignored when runtime: vllm. Configs that placed llama.cpp-only flags in extraArgs on a vLLM InferenceService will start failing at pod startup. Audit any vLLM InferenceService that sets extraArgs before upgrading. (#291)

Features

  • add hybrid GPU/CPU offloading support for MoE models (#281) (2287f66)
  • add tensor overrides and batch size controls for hybrid offloading (#283) (8be4adc)
  • expose additional runtime controls for llama.cpp and vllm (#291) (2245718)
  • recognize runtime-resolved sources (HF repo IDs) in Model controller (#293) (953e8a7)

Bug Fixes

  • inherit runAsUser/runAsGroup from podSecurityContext (#274) (72b9b5c)

Documentation

  • surface breaking behavior changes for 0.7.0 (#294) (e234a40)
llmkube-0.6.0 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.6.0 Breaking risk
Breaking changes
  • Default CUDA image changed from prior version to server-cuda13 for Qwen3.5 and Blackwell support.
Notable features
  • First-class PersonaPlex (Moshi) runtime backend added
  • Grafana inference metrics dashboard added
  • HPA autoscaling for InferenceService added
Full changelog

0.6.0 (2026-04-08)

⚠ BREAKING CHANGES

  • update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support (#262)

Features

  • add first-class PersonaPlex (Moshi) runtime backend (#272) (2b1c948)
  • add Grafana inference metrics dashboard (#269) (be376c6)
  • add HPA autoscaling for InferenceService (#260) (2d16502)
  • add pluggable runtime backends for non-llama.cpp inference engines (#271) (bb1576c)
  • add vLLM and TGI runtime backends with per-runtime HPA metrics (#273) (441c7c7)
  • separate image registry from repository in Helm chart (#268) (5c059a4)
  • support custom layer splits from GPUShardingSpec (#267) (a37701c)
  • update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support (#262) (cc9a95e)
llmkube-0.5.3 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.5.3 New feature
Notable features
  • KV cache type configuration and extraArgs escape hatch
  • Ollama runtime backend for Metal agent
  • oMLX alternative runtime backend for Metal agent
Full changelog

0.5.3 (2026-04-01)

Features

  • add KV cache type configuration and extraArgs escape hatch (#256) (7a4b855)
  • add Ollama as runtime backend for Metal agent (#258) (6148b89)
  • add oMLX as alternative runtime backend for Metal agent (#257) (eaf9045)

Bug Fixes

llmkube-0.5.2 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.5.2 New feature
Notable features
  • Add pod security context defaults and CRD overrides
Full changelog

0.5.2 (2026-03-27)

Features

  • add pod security context defaults and CRD overrides (#239) (904432b)

Documentation

llmkube-0.5.1 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.5.1 New feature
Notable features
  • Memory pressure watchdog with runtime monitoring
  • PVC:// model source and SHA256 integrity verification
  • Auto-detect llama-server from Homebrew paths on macOS
Full changelog

0.5.1 (2026-03-16)

Features

  • add memory pressure watchdog with runtime monitoring (#216) (5fa6d54)
  • add pvc:// model source and SHA256 integrity verification (#229) (1b94f5d)
  • auto-detect llama-server from Homebrew paths on macOS (#215) (a1e4302)

Bug Fixes

  • controller metrics port declarations and ServiceMonitor consistency (#214) (296ec99)
  • correct CHANGELOG entry from 0.4.21 to 0.5.0 (#212) (f7f703a)
  • quote job-level if expression to fix YAML parsing in helm-chart workflow (8714b9f)
llmkube-0.5.0 Bugfix

Fixed Helm chart appVersion mismatch with the published controller image.

Changelog

Helm chart for LLMKube v0.5.0 — fixes appVersion to match published controller image

v0.5.0 New feature
Notable features
  • Added per-model `memoryBudget` and `memoryFraction` CRD fields.
  • Added pre‑flight memory validation for Metal agent.
  • Added health checks, metrics, and continuous monitoring to Metal agent.
Full changelog

0.5.0 (2026-03-04)

Features

  • add pre-flight memory validation for Metal agent (#204) (ba252ef)
  • add health checks, metrics, and continuous monitoring to Metal agent (#205) (a113fd1)
  • add per-model memoryBudget and memoryFraction CRD fields (#206) (e632369)

Bug Fixes

  • agent: unregister service endpoints on metal process delete (#168) (147b9bc)
  • enable controller metrics endpoint in Helm chart (#195) (70940af)
  • prevent model re-download of cached models after helm upgrade (#203) (a8f9a88)
  • use Recreate strategy for GPU workloads to prevent rolling update deadlock (#196) (2e45181)

Documentation

  • rewrite README for clarity, positioning, and growth (#190) (a7fc152)
llmkube-0.4.20 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.20 Security relevant
Security fixes
  • Prevent command injection in init container shell commands — mitigates remote code execution vulnerability
Notable features
  • License compliance scanning for GGUF models
  • Prometheus metrics, OpenTelemetry tracing, and inference observability added
  • PVC inspection in cache list to detect orphaned entries
Full changelog

0.4.20 (2026-02-28)

Features

  • add license compliance scanning for GGUF models (#188) (c26400a)
  • add Prometheus metrics, OpenTelemetry tracing, and inference observability (#189) (c653ff1)
  • add PVC inspection to cache list for orphaned entry detection (#183) (2723d92)
  • agent: add structured zap logging to metal agent (#164) (e9d143c)
  • deps: upgrade to Kubernetes 1.35 and controller-runtime v0.23.1 (#175) (3c323f4)

Bug Fixes

  • correct Metal quickstart docs for selectorless services (#173) (89471ec)
  • prevent command injection in init container shell commands (#172) (3aa9cc3)
  • remove mutable latest tags and pin container images (#174) (3c4569a)

Documentation

  • add Apple Silicon Metal option to bug report template (#169) (e7689d8)
llmkube-0.4.19 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.19 New feature
Notable features
  • Added --jinja flag to enable Jinja templating for tool and function calls
Full changelog

0.4.19 (2026-02-21)

Features

  • add --jinja flag for tool/function calling support (#162) (47624ca)
llmkube-0.4.18 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.18 Bug fix

Fixed reading contextSize from the InferenceService CRD in the agent.

Full changelog

0.4.18 (2026-02-20)

Bug Fixes

  • agent: read contextSize from InferenceService CRD (#160) (17f58d4)

Documentation

  • update README and Metal Agent guide for remote K8s architecture (#156) (79145b2)
llmkube-0.4.17 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.17 Bug fix

Fixed agent filtering of InferenceServices to match the correct Metal accelerator type.

Full changelog

0.4.17 (2026-02-20)

Bug Fixes

  • agent: filter InferenceServices by Metal accelerator type (#157) (5737bb7)
llmkube-0.4.16 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.16 New feature
Notable features
  • Added --host-ip flag to agent for remote Kubernetes cluster support
Full changelog

0.4.16 (2026-02-20)

Features

  • agent: add --host-ip flag for remote K8s cluster support (#155) (b425569)

Documentation

  • Add Metal Agent (Apple Silicon) support to README (#151) (3579426)
llmkube-0.4.15 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.15 Bug fix

Fixed inference flag passing for newer llama.cpp versions.

Full changelog

0.4.15 (2026-02-15)

Bug Fixes

  • inference: pass value to --flash-attn for newer llama.cpp versions (#148) (25e08d0)
llmkube-0.4.14 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.14 New feature
Notable features
  • Native Go GGUF parser with CRD integration and CLI inspect command
  • FlashAttention support added to inference manifest
  • ContextSize parameter introduced in sample manifest
Full changelog

0.4.14 (2026-02-15)

Features

  • gguf: add native Go GGUF parser with CRD integration and CLI inspect (#140) (9d96ed4)
  • inference: add flashAttention and contextSize to sample manifest (914c929), closes #145
llmkube-0.4.13 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.13 New feature
Notable features
  • Controller init container image is now configurable.
  • InferenceService CRD exposes llama.cpp parallel slots setting.
  • Helm chart adds optional NetworkPolicy for controller manager.
Full changelog

0.4.13 (2026-02-07)

Features

  • controller: make init container image configurable (#128) (38ccdf0)
  • expose llama.cpp parallel slots in InferenceService CRD (#133) (cae7b52)
  • helm: add optional NetworkPolicy for controller manager (#135) (8d61ce3)
  • update model catalog with DeepSeek R1 and refresh stale entries (#131) (89eb5a6)
llmkube-0.4.12 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.12 Breaking risk
Notable features
  • Support for custom Certificate Authorities (CA)
  • Fixed deprecated image tags
Full changelog

0.4.12 (2026-01-22)

Features

  • add custom CA support and fix deprecated image tags (#124) (5ec912e)
llmkube-0.4.11 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.11 Bug fix

Fixed file uploads above 100MB silently dropping.

Full changelog

0.4.11 (2026-01-22)

Bug Fixes

  • cli: use numeric comparison for version checking (#109) (05e0025)
  • controller: use fully qualified image names for curl (#121) (213660b)
v0.4.10 New feature
Notable features
  • Air‑gapped deployment support for environments without internet access
  • 32B model catalog with --context flag support
  • GPU observability configuration and Grafana dashboard
Full changelog

What's New in v0.4.10

Features

  • Air-gapped deployment support - Deploy models from local file paths for environments without internet access (#85)
  • 32B models in catalog - Added larger models with --context flag support (#88)
  • GPU observability - New configuration and Grafana dashboard for GPU metrics (#105)
  • Benchmark test suites - Comprehensive benchmark sweeps for performance testing (#107)
  • Stress testing mode - New stress testing capabilities in the benchmark command (#104)

Documentation

  • Added community standards and security policy (#92)
  • Updated documentation for v0.4.9 GPU scheduling features (#83)

Installation

Homebrew (Recommended for macOS)

brew install defilantech/tap/llmkube

Install Script (Linux/macOS)

curl -sSL https://raw.githubusercontent.com/defilantech/LLMKube/main/install.sh | bash

Manual Download

macOS

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.10/LLMKube_0.4.10_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows: Download the .zip file and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Full Changelog: https://github.com/defilantech/LLMKube/compare/v0.4.9...v0.4.10

llmkube-0.4.10 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

llmkube-0.4.9 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.9 New feature
Notable features
  • GPU contention visibility with queue position and priority classes
Full changelog

0.4.9 (2025-12-01)

Features

  • add GPU contention visibility, queue position, and priority classes (#81) (c0220e5)

Documentation

  • add getting started video to README (#76) (ceb83d7)
llmkube-0.4.8 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.8 New feature
Notable features
  • Support configurable context size for llama.cpp server
Full changelog

0.4.8 (2025-11-27)

Features

  • Support configurable context size for llama.cpp server (#73) (6f8e04b)
llmkube-0.4.7 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.7 Bug fix

Fixed bug where Helm chart releases were incorrectly marked as the latest.

Full changelog

0.4.7 (2025-11-26)

Bug Fixes

  • Don't mark Helm chart release as latest (#70) (761b154)
llmkube-0.4.6 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.6 Bug fix

Fixed empty component causing "llmkube-" prefix in release identifiers.

Full changelog

0.4.6 (2025-11-26)

Bug Fixes

  • Set empty component to prevent llmkube- prefix in releases (#68) (45b61c6)
llmkube-0.4.5 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.5 Maintenance

Minor fixes and improvements.

Full changelog

0.4.5 (2025-11-26)

Bug Fixes

  • Clean up release process - single release with proper notes (#66) (4deae85)
llmkube-0.4.4 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.4.4 Bug fix

Fixed CI workflow to trigger GoReleaser and Helm release.

Full changelog

LLMKube v0.4.4

Release Date: 2025-11-26T19:10:25Z

See RELEASE_NOTES_v0.4.4.md for complete details.

Changelog

Bug Fixes

  • 9a37a77e556d6f811cb6a090125a4a73e2e9c346: fix: Trigger GoReleaser and Helm release from Release Please workflow (#64) (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.4/llmkube_0.4.4_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.4.4.md

v0.4.3 New feature
Notable features
  • Metal GPU support for macOS (Apple Silicon)
  • Model catalog with 10 pre‑configured models
  • Add benchmark command and reorganize documentation
Full changelog

0.4.3 (2025-11-26)

Features

  • Add benchmark command and reorganize documentation (58307be)
  • Add benchmark command and reorganize documentation (ac8888e), closes #6
  • Add Helm chart for easy installation (5718804)
  • Add Helm chart for easy installation with comprehensive CI testing (3ea3bfd), closes #9
  • Add Metal GPU support for macOS (Apple Silicon) (f673c26), closes #33
  • Add model catalog with 10 pre-configured models (404d722)
  • Add model catalog with 10 pre-configured models (Phase 1) (0fd969a)
  • Add persistent model cache to avoid re-downloading (83f844f), closes #52
  • Add Release Please automation and version-agnostic docs (dc2d54e)
  • helm: Add image digest support for production deployments (a38801d)
  • Implement automatic port forwarding for benchmark command (472b3ae)
  • Multi-GPU support with layer-based sharding (#47) (4797609)
  • Persistent model cache with per-namespace PVC support (ab04261)
  • Set up Helm repository on GitHub Pages (8d62737)
  • Support per-namespace model cache PVCs (c3cb891)

Bug Fixes

  • Add cacheKey to CRD and restrict cache to llmkube-system namespace (464c23d)
  • Add CRD keep policy and improve security test reliability (ff32296)
  • Add Helm chart publishing to release workflow (8baf9c4)
  • Add Helm chart publishing to release workflow (03bab72)
  • Add Homebrew archive IDs and v0.3.0 release notes (cea933b)
  • Address lint issues in benchmark command (bf80610)
  • Address linter errors in catalog implementation (8932e4f)
  • Address linter issues in Metal agent code (3f1f678)
  • controller: Add Model watch to InferenceService controller (cb4e201)
  • Correct CLI binary path in E2E tests (41af555)
  • Fix GoReleaser Homebrew tap configuration for v0.3.0 (4e95c04)
  • Further increase Helm CI timeout and readiness probe delay (5453d66)
  • Further increase Helm CI timeout and readiness probe delay (fd577d3)
  • Handle resp.Body.Close error in version check (linter) (fb3adf5)
  • Increase Helm chart CI timeout from 2m to 5m (7a08b45)
  • Increase Helm chart CI timeout from 2m to 5m (ced2210)
  • InferenceService stuck in Pending when Model becomes Ready (4d20aec)
  • Metal agent production fixes and testing improvements (8744c7b)
  • Resolve Helm chart CI test failures (9919696)
  • Resolve staticcheck SA5011 lint errors and update CONTRIBUTING.md (#60) (c0b5824)
  • Sanitize Service names for DNS-1035 compliance (v0.3.3) (db81990)
  • Sanitize Service names to comply with DNS-1035 requirements (b431986)
  • Skip containerized Deployment for Metal accelerator and add version check (d300e64)
  • Skip containerized Deployment for Metal accelerator and add version check (8dab955)
  • Suppress Endpoints API deprecation warnings (e70a4b3)
  • Update operator deployment to use correct container image (00fee75)
  • Update operator deployment to use correct container image (4c67a78)
  • Update version.go to 0.2.1 and add automation for future releases (8dd613d)
  • Update version.go to 0.2.1 and add automation for future releases (2ff68bd)
  • Use simple v* tag format for releases (#62) (bda9f19)
  • Use workspace path for kubeconform validation (fc066d8)

Documentation

  • Add CLI option to quick start, keep kubectl as fallback (f6829ee)
  • Add release notes for v0.3.2 (177abf8)
  • Add release notes for v0.3.2 (ca1bb12)
  • Add release notes for v0.4.0 (144b960)
  • Add release notes for v0.4.0 (a61321f)
  • Overhaul README and roadmap for public launch (b42c17e)
  • Update binary download links to version 0.2.1 (fad530a)
  • Update binary download links to version 0.2.1 (63bb0fa)
  • Update Helm installation to use GitHub Pages repository (477e037)
  • Update MODEL-CACHE.md for per-namespace PVC pattern (0be3f46)
llmkubev0.4.2 Maintenance

Minor fixes and improvements.

Full changelog

0.4.2 (2025-11-26)

Bug Fixes

  • Resolve staticcheck SA5011 lint errors and update CONTRIBUTING.md (#60) (c0b5824)
llmkubev0.4.1 New feature
Notable features
  • Add benchmark command and reorganize documentation
  • Add persistent model cache to avoid re‑downloading with per‑namespace PVC support
  • **helm:** Add image digest support for production deployments
Full changelog

0.4.1 (2025-11-26)

Features

  • Add benchmark command and reorganize documentation (58307be)
  • Add benchmark command and reorganize documentation (ac8888e), closes #6
  • Add persistent model cache to avoid re-downloading (83f844f), closes #52
  • Add Release Please automation and version-agnostic docs (dc2d54e)
  • helm: Add image digest support for production deployments (a38801d)
  • Implement automatic port forwarding for benchmark command (472b3ae)
  • Persistent model cache with per-namespace PVC support (ab04261)
  • Support per-namespace model cache PVCs (c3cb891)

Bug Fixes

  • Add cacheKey to CRD and restrict cache to llmkube-system namespace (464c23d)
  • Address lint issues in benchmark command (bf80610)

Documentation

  • Update MODEL-CACHE.md for per-namespace PVC pattern (0be3f46)
v0.4.0 New feature
Notable features
  • Multi‑GPU support with layer‑based sharding
Full changelog

LLMKube v0.4.0

Release Date: 2025-11-26T00:23:11Z

See RELEASE_NOTES_v0.4.0.md for complete details.

Changelog

New Features

  • 479760973eb811a0b7a71c711f52ca3d8695b761: feat: Multi-GPU support with layer-based sharding (#47) (@Defilan)

Bug Fixes

  • 03bab72a74496085b79e3c51838f9853ed674062: fix: Add Helm chart publishing to release workflow (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.4.0/llmkube_0.4.0_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.4.0.md

llmkube-0.4.0 Maintenance

Minor fixes and improvements.

Changelog

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.3.3 Bug fix

Fixed Service names to comply with DNS-1035 requirements.

Full changelog

LLMKube v0.3.3

Release Date: 2025-11-24T17:07:23Z

See RELEASE_NOTES_v0.3.3.md for complete details.

Changelog

Bug Fixes

  • b431986ceae6b383ee064bec595c922a42394a8e: fix: Sanitize Service names to comply with DNS-1035 requirements (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.3/llmkube_0.3.3_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.3.3.md

v0.3.2 Bug fix

Fixed resp.Body.Close error handling in version check and skipped containerized Deployment for Metal accelerator.

Full changelog

LLMKube v0.3.2

Release Date: 2025-11-24T16:28:19Z

See RELEASE_NOTES_v0.3.2.md for complete details.

Changelog

Bug Fixes

  • fb3adf57913744e08ebffb58af6877bd15fbeb93: fix: Handle resp.Body.Close error in version check (linter) (@Defilan)
  • 8dab955a2d1e728fe8a9b1b2971a4906454d71c3: fix: Skip containerized Deployment for Metal accelerator and add version check (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.2/llmkube_0.3.2_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.3.2.md

v0.3.1 Bug fix

Fixed controller OOM by increasing memory limits.

Full changelog

LLMKube v0.3.1

Release Date: 2025-11-24T09:17:13Z

See RELEASE_NOTES_v0.3.1.md for complete details.

Changelog

Bug Fixes

  • fd577d3137da086346524f1802e47219feefa1fa: fix: Further increase Helm CI timeout and readiness probe delay (@Defilan)
  • ced2210ea28d453fdac4c7346bc98f66684893b1: fix: Increase Helm chart CI timeout from 2m to 5m (@Defilan)
  • 4c67a7806232c687b7b2450660735d9265d507b8: fix: Update operator deployment to use correct container image (@Defilan)

Other Changes

  • 3e60a3031ef0f443209c0088e84f1a01dd1f6c1a: Release v0.3.1: Fix controller OOM with increased memory limits (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.1/llmkube_0.3.1_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.3.1.md

v0.3.0 New feature
Notable features
  • Add Metal GPU support for macOS (Apple Silicon)
Full changelog

LLMKube v0.3.0

Release Date: 2025-11-24T06:15:31Z

See RELEASE_NOTES_v0.3.0.md for complete details.

Changelog

New Features

  • f673c26bd4ac1a285dc7e72ffe6a930bc586b855: feat: Add Metal GPU support for macOS (Apple Silicon) (@Defilan)

Bug Fixes

  • cea933beac2607122772d14184b35da04738b7f9: fix: Add Homebrew archive IDs and v0.3.0 release notes (@Defilan)
  • 3f1f678502c985b04d48a1c8c8bc44ea68d8a542: fix: Address linter issues in Metal agent code (@Defilan)
  • 8744c7b54e23cbb77609a97340d9be9dd5da931c: fix: Metal agent production fixes and testing improvements (@Defilan)
  • e70a4b391725a70a82d78d47a7d4f6d2b898dcc8: fix: Suppress Endpoints API deprecation warnings (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.3.0/llmkube_0.3.0_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.3.0.md

v0.2.2 New feature
Notable features
  • Add Helm chart for easy installation with comprehensive CI testing
  • Add model catalog featuring ten pre‑configured models
Full changelog

LLMKube v0.2.2

Release Date: 2025-11-24T02:00:38Z

See RELEASE_NOTES_v0.2.2.md for complete details.

Changelog

New Features

  • 3ea3bfd27ce864f7884f25ae9db65ed52eb68e01: feat: Add Helm chart for easy installation with comprehensive CI testing (@Defilan)
  • 404d722e70d3e885f1e437ebdadf38fe43c7689a: feat: Add model catalog with 10 pre-configured models (@Defilan)

Bug Fixes

  • ff32296a45174bdce6070844a68007e2c45cf3fe: fix: Add CRD keep policy and improve security test reliability (@Defilan)
  • 8932e4fbb3fe8d1fea1fedba5bb18f3cd02808c8: fix: Address linter errors in catalog implementation (@Defilan)
  • 41af55589ba6b17f07119b50d96db9c39eac6ea3: fix: Correct CLI binary path in E2E tests (@Defilan)
  • 99196961bf91e4c285182211a7a6fdec574ae7e7: fix: Resolve Helm chart CI test failures (@Defilan)
  • 2ff68bdc0e40ab9ee8337403af649fda7354ad7c: fix: Update version.go to 0.2.1 and add automation for future releases (@Defilan)
  • fc066d8d0f9175382fa7cfab5f40c755739e175f: fix: Use workspace path for kubeconform validation (@Defilan)

Other Changes

  • aa84b601d75753c585cacace76311fbbac598080: Add Minikube quickstart guide and improve CLI-first documentation (@Defilan)
  • 5f08b27232102d17a0e2ae59f74176ed25a9689b: Update docs to recommend local controller for Minikube/local development (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.2/llmkube_0.2.2_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.2.2.md

v0.2.1 Bug fix

Fixed Model watch missing in InferenceService controller.

Full changelog

LLMKube v0.2.1

Release Date: 2025-11-18T16:21:32Z

See RELEASE_NOTES_v0.2.1.md for complete details.

Changelog

Other Changes

  • cb4e2019583a811fa98af1a446bd0df6b6c3cba2: fix(controller): Add Model watch to InferenceService controller (@Defilan)

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.1/llmkube_0.2.1_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.2.1.md

v0.2.0 Maintenance

Minor fixes and improvements.

Full changelog

LLMKube v0.2.0

Release Date: 2025-11-18T06:34:01Z

See RELEASE_NOTES_v0.2.0.md for complete details.

Changelog

Other Changes

  • f821f0f073040d82613e8ed809ab2d402f1fb2a7: Initial public release: LLMKube v0.2.0 (Christopher Maher [email protected])

Installation

macOS

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_darwin_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64 (Apple Silicon)
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_darwin_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Linux

# AMD64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_linux_amd64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

# ARM64
curl -L https://github.com/defilantech/LLMKube/releases/download/v0.2.0/llmkube_0.2.0_linux_arm64.tar.gz | tar xz
sudo mv llmkube /usr/local/bin/

Windows

Download the .zip file for your architecture and add llmkube.exe to your PATH.

Verify Installation

llmkube version

Next Steps

  1. Install the operator
  2. Try the GPU quickstart
  3. Read the documentation

Full Release Notes: RELEASE_NOTES_v0.2.0.md

Beta — feedback welcome: [email protected]