Skip to content

LLMKube

v0.7.8 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu
+12 more
self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

ReleasePort's take

Light signal
editorial:auto 13d

Release v0.7.8 introduces a configurable proxy with per-route timeouts and a ModelRouter skeleton.

Why it matters: Test the new configurable proxy and per‑route timeout settings in development before deploying to production.

Summary

AI summary

Added configurable proxy with per-route timeouts and ModelRouter skeleton.

Changes in this release

Feature Medium

configurable proxy + per-route/backend timeouts

configurable proxy + per-route/backend timeouts

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

external provider URL defaults + cluster-wide LiteLLM URL

external provider URL defaults + cluster-wide LiteLLM URL

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Helm packaging, sample manifest, and concept doc for ModelRouter

Helm packaging, sample manifest, and concept doc for ModelRouter

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

ModelRouterReconciler skeleton with spec validation

ModelRouterReconciler skeleton with spec validation

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

reconcile router-proxy Deployment, Service, and ConfigMap

reconcile router-proxy Deployment, Service, and ConfigMap

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

router-proxy binary with OpenAI streaming passthrough

router-proxy binary with OpenAI streaming passthrough

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

router-proxy cluster e2e + runtime fail-closed 503

router-proxy cluster e2e + runtime fail-closed 503

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

scaffold ModelRouter CRD types and deepcopy

scaffold ModelRouter CRD types and deepcopy

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

close cloud-tier conns + drop local idle timeout

close cloud-tier conns + drop local idle timeout

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

don't quarantine backends on per-attempt context deadline

don't quarantine backends on per-attempt context deadline

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

unblock MicroShift SCC diagnostics + bump bootstrap timeout

unblock MicroShift SCC diagnostics + bump bootstrap timeout

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

half-open circuit breaker on proxy + scale-to-zero status

half-open circuit breaker on proxy + scale-to-zero status

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

preserve external annotations on reconciler Deployment updates

preserve external annotations on reconciler Deployment updates

Source: llm_adapter@2026-05-21

Confidence: high

Other Medium

add consumer-hardware model matrix guide

add consumer-hardware model matrix guide

Source: llm_adapter@2026-05-21

Confidence: low

Other Medium

land ModelRouter prominently in README for the 0.7.8 release

land ModelRouter prominently in README for the 0.7.8 release

Source: llm_adapter@2026-05-21

Confidence: low

Other Medium

air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1)

air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1)

Source: llm_adapter@2026-05-21

Confidence: low

Other Medium

drop stale "fifteen lines" claim in openshift-install Reference

drop stale "fifteen lines" claim in openshift-install Reference

Source: llm_adapter@2026-05-21

Confidence: low

Full changelog

0.7.8 (2026-05-14)

Features

  • configurable proxy + per-route/backend timeouts (closes #457, #458) (#461) (03d222a)
  • external provider URL defaults + cluster-wide LiteLLM URL (closes #438) (#451) (26cd5ae)
  • Helm packaging, sample manifest, and concept doc for ModelRouter (#448) (a513fdc)
  • ModelRouterReconciler skeleton with spec validation (#445) (9b1a259)
  • reconcile router-proxy Deployment, Service, and ConfigMap (#447) (856ecc3)
  • router-proxy binary with OpenAI streaming passthrough (#446) (942d09a)
  • router-proxy cluster e2e + runtime fail-closed 503 (closes #430) (#450) (75151fa)
  • scaffold ModelRouter CRD types and deepcopy (#442) (e6c60b3)

Bug Fixes

  • close cloud-tier conns + drop local idle timeout (closes #459) (#460) (173c26a)
  • don't quarantine backends on per-attempt context deadline (closes #462) (#463) (80ef9c8)
  • e2e: unblock MicroShift SCC diagnostics + bump bootstrap timeout (#466) (0c793b7)
  • half-open circuit breaker on proxy + scale-to-zero status (closes #452, #453) (#454) (ac9302c)
  • preserve external annotations on reconciler Deployment updates (#468) (de580c1)

Documentation

  • add consumer-hardware model matrix guide (#444) (dd07397)
  • readme: land ModelRouter prominently for the 0.7.8 release (#464) (deb24bb)
  • site: air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) (#465) (5996a1e)
  • site: drop stale "fifteen lines" claim in openshift-install Reference (#467) (ec52ca8)

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track LLMKube

Get notified when new releases ship.

Sign up free

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases →

Related context

Earlier breaking changes

  • v0.8.1 foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.

Beta — feedback welcome: [email protected]