LLMKube

v0.7.8 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo Containers & Orchestration

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai apple-silicon autoscaling edge-computing gguf gpu

+12 more

self-hosted inference kubernetes llama-cpp llm local-llm metal mlx multi-gpu nvidia tgi vllm

ReleasePort's take

Light signal

editorial:auto 2mo

Release v0.7.8 introduces a configurable proxy with per-route timeouts and a ModelRouter skeleton.

Why it matters: Test the new configurable proxy and per‑route timeout settings in development before deploying to production.

Summary

AI summary

Added configurable proxy with per-route timeouts and ModelRouter skeleton.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	configurable proxy + per-route/backend timeouts configurable proxy + per-route/backend timeouts Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	external provider URL defaults + cluster-wide LiteLLM URL external provider URL defaults + cluster-wide LiteLLM URL Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Helm packaging, sample manifest, and concept doc for ModelRouter Helm packaging, sample manifest, and concept doc for ModelRouter Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	ModelRouterReconciler skeleton with spec validation ModelRouterReconciler skeleton with spec validation Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	reconcile router-proxy Deployment, Service, and ConfigMap reconcile router-proxy Deployment, Service, and ConfigMap Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	router-proxy binary with OpenAI streaming passthrough router-proxy binary with OpenAI streaming passthrough Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	router-proxy cluster e2e + runtime fail-closed 503 router-proxy cluster e2e + runtime fail-closed 503 Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	scaffold ModelRouter CRD types and deepcopy scaffold ModelRouter CRD types and deepcopy Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix
Bugfix	Medium	close cloud-tier conns + drop local idle timeout close cloud-tier conns + drop local idle timeout Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	don't quarantine backends on per-attempt context deadline don't quarantine backends on per-attempt context deadline Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	unblock MicroShift SCC diagnostics + bump bootstrap timeout unblock MicroShift SCC diagnostics + bump bootstrap timeout Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	half-open circuit breaker on proxy + scale-to-zero status half-open circuit breaker on proxy + scale-to-zero status Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	preserve external annotations on reconciler Deployment updates preserve external annotations on reconciler Deployment updates Source: llm_adapter@2026-05-21 Confidence: high	—
Other
Other	Medium	add consumer-hardware model matrix guide add consumer-hardware model matrix guide Source: llm_adapter@2026-05-21 Confidence: low	—
Other	Medium	land ModelRouter prominently in README for the 0.7.8 release land ModelRouter prominently in README for the 0.7.8 release Source: llm_adapter@2026-05-21 Confidence: low	—
Other	Medium	air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) Source: llm_adapter@2026-05-21 Confidence: low	—
Other	Medium	drop stale "fifteen lines" claim in openshift-install Reference drop stale "fifteen lines" claim in openshift-install Reference Source: llm_adapter@2026-05-21 Confidence: low	—

Full changelog

0.7.8 (2026-05-14)

Features

configurable proxy + per-route/backend timeouts (closes #457, #458) (#461) (03d222a)
external provider URL defaults + cluster-wide LiteLLM URL (closes #438) (#451) (26cd5ae)
Helm packaging, sample manifest, and concept doc for ModelRouter (#448) (a513fdc)
ModelRouterReconciler skeleton with spec validation (#445) (9b1a259)
reconcile router-proxy Deployment, Service, and ConfigMap (#447) (856ecc3)
router-proxy binary with OpenAI streaming passthrough (#446) (942d09a)
router-proxy cluster e2e + runtime fail-closed 503 (closes #430) (#450) (75151fa)
scaffold ModelRouter CRD types and deepcopy (#442) (e6c60b3)

Bug Fixes

close cloud-tier conns + drop local idle timeout (closes #459) (#460) (173c26a)
don't quarantine backends on per-attempt context deadline (closes #462) (#463) (80ef9c8)
e2e: unblock MicroShift SCC diagnostics + bump bootstrap timeout (#466) (0c793b7)
half-open circuit breaker on proxy + scale-to-zero status (closes #452, #453) (#454) (ac9302c)
preserve external annotations on reconciler Deployment updates (#468) (de580c1)

Documentation

add consumer-hardware model matrix guide (#444) (dd07397)
readme: land ModelRouter prominently for the 0.7.8 release (#464) (deb24bb)
site: air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) (#465) (5996a1e)
site: drop stale "fifteen lines" claim in openshift-install Reference (#467) (ec52ca8)

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track LLMKube

Get notified when new releases ship.

About LLMKube

Kubernetes operator for llama.cpp-native LLM inference with GPU scheduling, Apple Silicon Metal support, and OpenAI-compatible API.

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.1 foreman: requestTimeoutSeconds now sets loop-wide budget, default changes from 600 to 3600.