LocalAI

v4.2.0 Security

This release includes 3 security fixes for security teams reviewing exposed deployments.

Published 23d Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

This release patches 3 known CVEs

Topics

agents ai api audio-generation decentralized distributed

+12 more

image-generation libp2p llama llm mamba mcp musicgen object-detection rerank stable-diffusion text-generation tts

ReleasePort's take

Moderate signal

editorial:auto 13d

LocalAI v4.2.0 introduces voice & face biometrics, audio diarization, Ollama‑compatible API, video generation from stable-diffusion.ggml, and a redesigned multilingual UI with brandable settings.

Why it matters: Patch immediately to remove the unsafe sprintf() in grpc-server.cpp that could cause buffer overflow; this security fix is critical for all deployments using the gRPC server surface.

Summary

AI summary

LocalAI adds voice & face biometrics, audio diarization, Ollama drop‑in API, video generation and a redesigned multilingual UI with brandable settings.

Changes in this release

Type	Severity	Summary	CVE
Security	Medium	Removed unsafe sprintf() in grpc-server.cpp preventing buffer overflow. Removed unsafe sprintf() in grpc-server.cpp preventing buffer overflow. Source: llm_adapter@2026-05-21 Confidence: high	—
Security	Medium	Settings API strips env-supplied ApiKeys before persisting to prevent leaks. Settings API strips env-supplied ApiKeys before persisting to prevent leaks. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature
Feature	Medium	Voice recognition pipeline with /v1/voice/* endpoints for speaker verification and identification. Voice recognition pipeline with /v1/voice/* endpoints for speaker verification and identification. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Face recognition supports 1:1 verify, 1:N identify, detection, analysis, and antispoofing. Face recognition supports 1:1 verify, 1:N identify, detection, analysis, and antispoofing. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	/v1/audio/diarization endpoint segments speech by speaker turn. /v1/audio/diarization endpoint segments speech by speaker turn. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	faster-whisper generates word-level timestamps in transcriptions. faster-whisper generates word-level timestamps in transcriptions. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Whisper transcription supports client cancellation via GGML abort callback. Whisper transcription supports client cancellation via GGML abort callback. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Ollama API drop-in compatibility allows existing clients to connect to LocalAI. Ollama API drop-in compatibility allows existing clients to connect to LocalAI. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	stable-diffusion.ggml backend generates video with image-to-video and first-last-frame modes. stable-diffusion.ggml backend generates video with image-to-video and first-last-frame modes. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	React chat UI redesigned with Nord palette, cleaner layout, better message density. React chat UI redesigned with Nord palette, cleaner layout, better message density. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	React UI supports multilingual interface in 5 languages via i18n. React UI supports multilingual interface in 5 languages via i18n. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Admin panel allows customizable instance name, tagline, logo, and favicon. Admin panel allows customizable instance name, tagline, logo, and favicon. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Interactive model config editor with autocomplete and live validation in UI. Interactive model config editor with autocomplete and live validation in UI. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Backend versioning with automatic upgrade detection and auto-upgrade mechanism. Backend versioning with automatic upgrade detection and auto-upgrade mechanism. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Models can be pinned to survive garbage collection reaper. Models can be pinned to survive garbage collection reaper. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Per-model exclusive concurrency groups prevent heavy backends from resource contention. Per-model exclusive concurrency groups prevent heavy backends from resource contention. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Universal importer expands across most backends with multi-shard GGUF support. Universal importer expands across most backends with multi-shard GGUF support. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	11 new backends: sglang, ik-llama.cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, and others. 11 new backends: sglang, ik-llama.cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, and others. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	vLLM achieves feature parity with llama.cpp backend. vLLM achieves feature parity with llama.cpp backend. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	vLLM exposes full AsyncEngineArgs via generic YAML engine_args map. vLLM exposes full AsyncEngineArgs via generic YAML engine_args map. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Tensor-parallel distributed workers enable single model across multiple nodes. Tensor-parallel distributed workers enable single model across multiple nodes. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	CUDA 13 builds available for vLLM, vLLM-omni, and sglang. CUDA 13 builds available for vLLM, vLLM-omni, and sglang. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Distributed mode v2 adds orchestrator resilience and round-robin replica balancing. Distributed mode v2 adds orchestrator resilience and round-robin replica balancing. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	NATS backend upgrade split from install for cleaner distributed management. NATS backend upgrade split from install for cleaner distributed management. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Transcription stream-done event includes segments, duration, and detected language. Transcription stream-done event includes segments, duration, and detected language. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	LocalVQE backend enables audio effects exploration in React UI. LocalVQE backend enables audio effects exploration in React UI. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	GPU support for AMD Strix Halo / Ryzen AI MAX (gfx1151). GPU support for AMD Strix Halo / Ryzen AI MAX (gfx1151). Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	NVIDIA L4T arm64 CUDA 13 support for Jetson-class boards. NVIDIA L4T arm64 CUDA 13 support for Jetson-class boards. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	ROCm bumped to 7.x with latest driver support. ROCm bumped to 7.x with latest driver support. Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	PostgreSQL cascades user deletion across all owned data. PostgreSQL cascades user deletion across all owned data. Source: llm_adapter@2026-05-21 Confidence: high	—
Other	Medium	Version 4.2.0 released with numerous features, bug fixes, security improvements, and new contributors. Version 4.2.0 released with numerous features, bug fixes, security improvements, and new contributors. Source: granite4.1:30b@2026-05-24-audit Confidence: low	—

Full changelog

🎉 LocalAI 4.2.0 Release! 🚀

LocalAI 4.2.0 is out!

This release teaches LocalAI to see and hear. New /v1/voice/* and /v1/audio/diarization endpoints, a full face-recognition pipeline with antispoofing, word-level timestamps for faster-whisper, and a client-cancellable Whisper. There is also a drop-in Ollama API, video generation in stable-diffusion.ggml, a redesigned chat with i18n and admin-configurable branding, eleven new backends, an interactive model config editor with autocomplete, and a hardened distributed mode v2. vLLM finally hits feature parity with llama.cpp and gets tensor-parallel distributed workers.

📌 TL;DR

| Feature | Summary |
|---------|---------|
| 🎙️ Voice Recognition | New /v1/voice/*. Verify, identify, embed and analyze speakers. |
| 👤 Face Recognition + Liveness | 1:1 verify, 1:N identify, detect, analyze, embed, and reject spoofed photos. |
| 🎬 Diarization | New /v1/audio/diarization endpoint, "who spoke when?" via sherpa-onnx + vibevoice.cpp. |
| 🗣️ Better Transcriptions | Word-level timestamps, client-cancellable Whisper, segments + duration + language on the stream-done event. |
| 🦙 Ollama API | Drop-in compatibility. Point your ollama client straight at LocalAI. |
| 🎬 Video Generation | stable-diffusion.ggml now generates video (i2v, first-last-frame). |
| 💬 Redesigned UI | Chat redesign, Nord palette, i18n (5 languages), admin-configurable branding. |
| ✏️ Interactive Model Editor | Autocomplete-driven config editor in the UI. |
| 📦 Universal Importer | Imports across most backends, not just llama.cpp. |
| 🚦 Concurrency Groups | Per-model exclusive groups for safe backend loading. |
| 🧪 11 New Backends | sglang, ik-llama-cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, tinygrad-multimodal, LocalVQE, vibevoice-cpp, insightface (liveness), voice-rec. |
| ⚡ vLLM @ parity | Feature parity with llama.cpp + tensor-parallel distributed workers + full engine_args. |
| 🛰️ Distributed v2 | Hardened orchestrator, round-robin replicas, scoped Upgrade All, NATS install/upgrade split. |

🚀 New Features & Major Enhancements

🎙️ Voice Recognition

LocalAI is now ears-on. New /v1/voice/* endpoints let you verify, identify, analyze and embed speakers, powered by a SpeechBrain + ONNX Python backend.

1:1 Verify, "is this the same speaker?"
1:N Identify, "who is talking, out of my enrolled users?"
Embeddings, voice fingerprints for your own pipelines
Analyze, age, gender, emotion attributes per segment

🔥 Pairs naturally with the new diarization endpoint for full speaker pipelines.

https://github.com/user-attachments/assets/3777decd-d82b-42f5-a4e1-43f2da44e6c8

👤 Face Recognition & Antispoofing

A complete face-biometrics pipeline, built on InsightFace + ONNX.

1:1 Verify, match two faces
1:N Identify, resolve a face against an enrolled set
Detection & Analysis, find faces, extract attributes (age, gender, emotion, race)
Embeddings, facial fingerprints for your own stack
🆕 Antispoofing (liveness), reject spoofed photos and videos

✅ Samples never leave your machine. They go only to the running backend.

https://github.com/user-attachments/assets/37c1271e-b1e3-4b5d-a1b4-f8d870051da3

🎬 Diarization & a smarter audio pipeline

Audio is a first-class citizen now.

/v1/audio/diarization, segments speech by speaker turn (sherpa-onnx + vibevoice.cpp)
Word-level timestamps for faster-whisper
Client cancellation for Whisper via the ggml abort_callback. Stop a transcription mid-flight and free the GPU.
Stream-done metadata on /v1/audio/transcriptions. segments, duration and language on the final event.
Audio transformations UI (LocalVQE), explore audio FX directly from the React UI
Transcription error visibility, handler errors land in the access log and on the client

🦙 Ollama drop-in API

Point your existing Ollama client at LocalAI. Everything keeps working. Another front door, same engine.

OLLAMA_HOST=http://localhost:8080 ollama run qwen3

🎬 Video Generation

The stable-diffusion.ggml backend now generates video, with curated gallery entries for Wan 2.1 FLF2V 14B 720P and Wan i2v 720p, plus a new stablediffusion-ggml-development meta backend to track the cutting edge.

🎨 React UI: total refresh

A massive UI cycle landed in 4.2:

💬 Chat redesign, cleaner layout, faster perceived latency, better message density
🎨 Editorial refresh with the Nord palette, calmer, more focused, dark-mode-first
🌍 Multilingual / i18n, English, Italiano, Español, Deutsch, 简体中文
🪪 Brandable instance, admin-configurable name, tagline, and assets (logo, favicon)
✏️ Interactive model config editor, autocomplete over known fields, live validation, automatic file-renaming on save
🧰 Backend management UX, revamped backend list with concrete versions
🛟 Better error UX, distributed backend management errors surface cleanly

💡 Self-host with your branding. The login page, sidebar, footer, and browser tab all pick up the instance name and logo.

https://github.com/user-attachments/assets/91a7a8c8-15e8-4bd7-b97b-64fe0466bbd7

https://github.com/user-attachments/assets/369e0dc7-87ba-4303-8193-24eda03fdb1f

🔄 Backend & model lifecycle

Backend versioning with automatic upgrade detection
Pin models so they survive the reaper
On-demand toggle per model to control auto-load
Concurrency groups, per-model exclusive groups so heavy backends won't trample each other
Universal importer, single flow that imports across most backends, with clean multi-shard GGUF handling and dedicated importers for vibevoice-cpp and whisper.cpp HF repos

https://github.com/user-attachments/assets/3d3be7ea-2601-4284-9a89-358ae99a926e

https://github.com/user-attachments/assets/f13c5ca9-f174-48c0-9aee-e3406d50e607

🧪 New Backends!

| Backend | What it brings |
|---|---|
| sglang | High-throughput LLM serving + speculative decoding (EAGLE/EAGLE3/DFLASH/MTP) |
| ik-llama.cpp | ikawrakow's llama.cpp fork |
| TurboQuant | Quant-focused llama.cpp fork |
| sam.cpp | Segment Anything detection |
| Kokoros | Rust-native Kokoro TTS |
| qwen3tts.cpp | Qwen3 TTS |
| tinygrad-multimodal (experimental) | tinygrad-powered multimodal |
| vibevoice.cpp | Diarization-grade speech |
| LocalVQE | Audio transformations / FX |
| insightface | Face antispoofing |
| voice-rec | Speaker recognition / embeddings |

⚡ vLLM at parity (and beyond)

vLLM parity with llama.cpp, same feature surface, same ergonomics
vLLM engine_args, the full AsyncEngineArgs exposed via a generic YAML map
Tensor-parallel distributed workers, fan a single model across nodes
CUDA 13 builds for vLLM, vLLM-omni and sglang
L4T arm64 (CUDA 13), vLLM/vLLM-omni/sglang variants for Jetson-class arm64
MLX backend refactored, shared helpers and enhanced functionality
llama.cpp split_mode for explicit multi-GPU placement
Speculative decoding wired through for llama.cpp, Gemma 4 thinking support added
Vision / mtmd marker propagated from the backend via ModelMetadata

🛰️ Distributed Mode v2

Distributed mode keeps maturing. This release was a hardening pass across the orchestration loop:

Orchestrator resilience, auto-upgrade routing, worker bind-wait, RAG-init crash, log-spam fixes
Round-robin across replicas of the same model
Upgrade All scoped to nodes that actually have the backend installed
NATS install / upgrade split, backend.upgrade no longer piggybacks on install
Cached-replica lookup honors NodeSelector, the reconciler no longer scales up empty backends
VRAM/RAM reporting correct on NVIDIA unified-memory hosts
Agent nodes, queue loops stop on teardown, dead-letter cap added
Autoscaling, load-model extracted from Route() and applied during autoscale

🔐 Auth & Security

Settings API, env-supplied ApiKeys are stripped before persisting (no accidental leaks)
grpc-server hardening, removed unsafe sprintf() in the C++ grpc server
OIDC, bumped go-oidc/v3 to 3.18.0
Security hardening pass across the codebase
AI coding assistants policy, LocalAI now follows the Linux kernel's DCO/attribution guidelines (Assisted-by: trailer, no AI co-authors)

🖥️ Hardware & deployment

CUDA 13 for vLLM, vLLM-omni, and sglang
NVIDIA L4T arm64 (CUDA 13) for Jetson-class boards
ROCm 7.x bumped to latest
gfx1151 (Strix Halo / Ryzen AI MAX) support, AMDGPU_TARGETS exposed as a build-arg
Intel GPU, latest oneapi-basekit (b70 support) across Intel images
arm64 CI, cpu-whisperx and cpu-faster-whisper now ship arm64 images
whisperx, ROCm/HIPBLAS target dropped (pinned to rocm6.4 wheels)

🛠️ Under the Hood

Better CLI errors with actionable guidance
golangci-lint baseline (new-from-merge-base) keeps drift in check
Coding-agent discoverability, new APIs let coding agents introspect and configure LocalAI
Autoparser, prefers backend-emitted chat deltas, correct logprob passthrough, strips partial reasoning tags during warm-up
Reasoning + tools, no more empty content from thinking models in retry loops
Streaming hygiene, deduped content, deduped tool calls, recovered reasoning, unique tool_call IDs in deferred flushes
HTTP, handler-error status now visible in the access log + transcription error surface
Backend monitor accepts model as a query parameter
Config loader, YAML backup files are ignored
GGUF thinking probe respects explicit reasoning config
Inference defaults refreshed from Unsloth
Embeddings on collection upload, dim changes handled gracefully
Python backends, JIT subprocesses use tempfile.gettempdir() instead of hardcoded /tmp
Draft model paths, relative paths now resolve against the models dir
whisper-cpp: implement streaming transcription and context cancellation

🐞 Notable fixes

Cascading user deletion on PostgreSQL, deleting a user removes all owned data
Importer emits all shards for multi-part GGUF models
Open Responses parses OpenAI-spec nested tool_choice and uses the correct setter
llama-cpp: server-chat.cpp included in grpc-server TU, common -> llama-common rename, turboquant common.h detection
ik-llama-cpp: adapted to common_grammar in sampling.h, patched clip.cpp for the new ggml_quantize_chunk signature
Kokoros: trait stubs (face_verify, face_analyze, audio_transcription_stream), CI publish
stable-diffusion.ggml: MP4 container forced in ffmpeg mux, new i2v options
Gallery: orphaned meta-backend uninstall, gemma-4 URIs, flux-kontext param overrides, Wan dedup, z-image-turbo load, Qwen3.5 typo override, tag-casing normalization
Streaming: content + tool-call dedup, reasoning recovery, unique tool-call IDs in deferred flush
Realtime: consume ChatDeltas when the C++ autoparser clears Response
Tool-calls: use SetFunctionCallNameString when forcing a specific tool
Faster-whisper: cast segment timestamps to int after multiplication
mlx-vlm: pinned to v0.4.4 to unblock CUDA builds
vLLM: dropped flash-attn wheel to avoid torch 2.10 ABI mismatch
Downloader: list supported URL schemes in DownloadFile errors
Backend: resolve relative draft_model paths against the models dir
CI: wire AMDGPU_TARGETS through the backend workflow, switch gallery-agent to sigs.k8s.io/yaml, recover rerankers + vllm-omni on aarch64, unbreak master CI for docs/kokoros/vibevoice-cpp ABI

🆕 Gallery additions

Wan 2.1 FLF2V 14B 720P (video)
Wan i2v 720p (image-to-video)
stablediffusion-ggml-development meta backend
chroma1-hd (diffusers)
Gemma 4 (+ mmproj)
EmbeddingGemma
Qwen 3.5, Qwen-ASR, OCR entries for llama.cpp
Qwen3-VL Reranker, Qwen3-VL Embedding (tagged)
A steady stream of automated gallery-agent model additions throughout the cycle 🤖

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Drop-in REST API compatible with OpenAI specs for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

Local AI agent management platform. Drop-in for OpenAI's Responses API, with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall

RESTful API and knowledge-base management providing persistent memory and storage for AI agents. Pairs with LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

❤️ Thank You

LocalAI is a true FOSS movement, built by contributors, powered by community.

If you believe in privacy-first, self-hosted AI:

⭐ Star the repo
💬 Contribute code, docs, translations or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes :bug:

fix(autoscaling): extract load model from Route() and use as well when doing autoscale by @mudler in https://github.com/mudler/LocalAI/pull/9270
fix(nodes): better detection if nodes goes down or model is not available by @mudler in https://github.com/mudler/LocalAI/pull/9274
fix: try to add whisperx and faster-whisper for more variants by @mudler in https://github.com/mudler/LocalAI/pull/9278
fix: thinking models with tools returning empty content (reasoning-only retry loop) by @mudler in https://github.com/mudler/LocalAI/pull/9290
fix(streaming): deduplicate tool call emissions during streaming by @mudler in https://github.com/mudler/LocalAI/pull/9292
fix(streaming): skip chat deltas for role-init elements to prevent first token duplication by @mudler in https://github.com/mudler/LocalAI/pull/9299
Fix load of z-image-turbo by @thelittlefireman in https://github.com/mudler/LocalAI/pull/9264
fix(agents): handle embedding model dim changes on collection upload by @mudler in https://github.com/mudler/LocalAI/pull/9365
fix(gallery): correct gemma-4 model URIs returning 404 by @mvanhorn in https://github.com/mudler/LocalAI/pull/9379
fix(ui): rename model config files on save to prevent duplicates by @mudler in https://github.com/mudler/LocalAI/pull/9388
fix(ci): switch gallery-agent to sigs.k8s.io/yaml by @mudler in https://github.com/mudler/LocalAI/pull/9397
fix(llama-cpp): rename linked target common -> llama-common by @mudler in https://github.com/mudler/LocalAI/pull/9408
fix(vision): propagate mtmd media marker from backend via ModelMetadata by @mudler in https://github.com/mudler/LocalAI/pull/9412
fix(turboquant): resolve common.h by detecting llama-common vs common target by @mudler in https://github.com/mudler/LocalAI/pull/9413
fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg by @keithmattix in https://github.com/mudler/LocalAI/pull/9410
fix(kokoros): implement audio_transcription_stream trait stub by @mudler in https://github.com/mudler/LocalAI/pull/9422
fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc by @mudler in https://github.com/mudler/LocalAI/pull/9423
fix(distributed): stop queue loops on agent nodes + dead-letter cap by @mudler in https://github.com/mudler/LocalAI/pull/9433
fix(gallery): allow uninstalling orphaned meta backends + force reinstall by @mudler in https://github.com/mudler/LocalAI/pull/9434
fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux by @mudler in https://github.com/mudler/LocalAI/pull/9435
fix(settings): strip env-supplied ApiKeys from the request before persisting by @SAY-5 in https://github.com/mudler/LocalAI/pull/9438
fix(api): remove duplicate /api/traces endpoint that broke React UI by @pjbrzozowski in https://github.com/mudler/LocalAI/pull/9427
fix(distributed): pass ExternalURI through NATS backend install by @russell in https://github.com/mudler/LocalAI/pull/9446
fix(ci): wire AMDGPU_TARGETS through backend build workflow by @russell in https://github.com/mudler/LocalAI/pull/9445
fix(config): ignore yaml backup files in model loader by @leinasi2014 in https://github.com/mudler/LocalAI/pull/9443
[gallery] Fix duplicate sha256 keys in Wan models by @sec171 in https://github.com/mudler/LocalAI/pull/9461
fix(tests): update InstallBackend call sites for new URI/Name/Alias params by @mudler in https://github.com/mudler/LocalAI/pull/9467
Fix: Add model parameter to neutts-air gallery definition by @localai-bot in https://github.com/mudler/LocalAI/pull/8793
fix(gallery-agent): process blacklist command on recently-closed PRs by @mudler in https://github.com/mudler/LocalAI/pull/9473
Respect explicit reasoning config during GGUF thinking probe by @leinasi2014 in https://github.com/mudler/LocalAI/pull/9463
fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush by @mudler in https://github.com/mudler/LocalAI/pull/9470
fix(backend-monitor): accept model as a query parameter by @Dennisadira in https://github.com/mudler/LocalAI/pull/9411
fix(kokoros): Build and publish the backend images from CI/CD by @richiejp in https://github.com/mudler/LocalAI/pull/9487
fix: remove unsafe sprintf() in grpc-server.cpp by @orbisai0security in https://github.com/mudler/LocalAI/pull/9486
fix(kokoros): implement face_verify and face_analyze trait stubs by @mudler in https://github.com/mudler/LocalAI/pull/9499
fix(ik-llama-cpp): adapt to common_grammar struct in sampling.h by @mudler in https://github.com/mudler/LocalAI/pull/9512
fix(llama-cpp): include server-chat.cpp in grpc-server translation unit by @mudler in https://github.com/mudler/LocalAI/pull/9511
fix(importer): emit all shards for multi-part GGUF models by @mudler in https://github.com/mudler/LocalAI/pull/9513
fix(openresponses): parse OpenAI-spec nested tool_choice + use correct setter by @walcz-de in https://github.com/mudler/LocalAI/pull/9509
fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9526
fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature by @mudler in https://github.com/mudler/LocalAI/pull/9531
fix(realtime): consume ChatDeltas when C++ autoparser clears Response by @richiejp in https://github.com/mudler/LocalAI/pull/9538
fix: add hipblaslt library by @eglia in https://github.com/mudler/LocalAI/pull/9541
fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts by @mudler in https://github.com/mudler/LocalAI/pull/9545
fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch by @richiejp in https://github.com/mudler/LocalAI/pull/9557
fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds by @mudler in https://github.com/mudler/LocalAI/pull/9568
fix(gallery): normalize inconsistent tag casing/plurals across gallery models by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9574
fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9580
fix(diffusers): drop compel from requirements to unblock pip resolver by @mudler in https://github.com/mudler/LocalAI/pull/9632
fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds by @russell in https://github.com/mudler/LocalAI/pull/9626
fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups by @localai-bot in https://github.com/mudler/LocalAI/pull/9652
fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam by @localai-bot in https://github.com/mudler/LocalAI/pull/9657
fix(faster-whisper): cast segment timestamps to int after multiplication by @arteven in https://github.com/mudler/LocalAI/pull/9674
fix(python-backend): make JIT subprocesses work on hosts of any size by @richiejp in https://github.com/mudler/LocalAI/pull/9679
fix(distributed): scope Upgrade All to nodes that have the backend installed by @mudler in https://github.com/mudler/LocalAI/pull/9678
fix(backend): resolve relative draft_model paths against the models dir by @localai-bot in https://github.com/mudler/LocalAI/pull/9680
fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) by @localai-bot in https://github.com/mudler/LocalAI/pull/9682
fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 by @localai-bot in https://github.com/mudler/LocalAI/pull/9688
fix(distributed): round-robin replicas of the same model by @localai-bot in https://github.com/mudler/LocalAI/pull/9695
fix(downloader): list supported URL schemes in DownloadFile error by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9689
fix(auth): cascade user deletion across all owned data on PostgreSQL by @localai-bot in https://github.com/mudler/LocalAI/pull/9702
fix(http): make handler-error status visible in access log + transcription errors by @localai-bot in https://github.com/mudler/LocalAI/pull/9707
fix(distributed): make backend upgrade actually re-install on workers by @localai-bot in https://github.com/mudler/LocalAI/pull/9708
fix(distributed): split NATS backend.upgrade off install + dedup loads by @localai-bot in https://github.com/mudler/LocalAI/pull/9717
fix(gallery): keep auto-upgrade off non-dev backends when -development is installed by @mudler in https://github.com/mudler/LocalAI/pull/9736

Exciting New Features 🎉

feat(ui): Interactive model config editor with autocomplete by @richiejp in https://github.com/mudler/LocalAI/pull/9149
feat: track files being staged by @mudler in https://github.com/mudler/LocalAI/pull/9275
feat: Add Kokoros backend by @richiejp in https://github.com/mudler/LocalAI/pull/9212
feat(api): add ollama compatibility by @mudler in https://github.com/mudler/LocalAI/pull/9284
feat(sam.cpp): add sam.cpp detection backend by @mudler in https://github.com/mudler/LocalAI/pull/9288
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9300
chore(qwen3-asr): pass prompt as context to transcribe by @mudler in https://github.com/mudler/LocalAI/pull/9301
feat: Add toggle mechanism to enable/disable models from loading on demand by @neurocis in https://github.com/mudler/LocalAI/pull/9304
feat: allow to pin models and skip from reaping by @mudler in https://github.com/mudler/LocalAI/pull/9309
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9310
feat: backend versioning, upgrade detection and auto-upgrade by @mudler in https://github.com/mudler/LocalAI/pull/9315
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9318
feat(qwen3tts.cpp): add new backend by @mudler in https://github.com/mudler/LocalAI/pull/9316
feat(ux): backend management enhancement by @mudler in https://github.com/mudler/LocalAI/pull/9325
feat(rocm): bump to 7.x by @mudler in https://github.com/mudler/LocalAI/pull/9323
feat(backends): add ik-llama-cpp by @mudler in https://github.com/mudler/LocalAI/pull/9326
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9329
feat(vllm): parity with llama.cpp backend by @mudler in https://github.com/mudler/LocalAI/pull/9328
feat: refactor shared helpers and enhance MLX backend functionality by @mudler in https://github.com/mudler/LocalAI/pull/9335
feat: wire transcription for llama.cpp, add streaming support by @mudler in https://github.com/mudler/LocalAI/pull/9353
feat(backend): add turboquant llama.cpp-fork backend by @mudler in https://github.com/mudler/LocalAI/pull/9355
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9356
feat(backend): add tinygrad multimodal backend (experimental) by @mudler in https://github.com/mudler/LocalAI/pull/9364
feat(backends): add sglang by @mudler in https://github.com/mudler/LocalAI/pull/9359
refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer by @mudler in https://github.com/mudler/LocalAI/pull/9380
feat(stable-diffusion.ggml): add support for video generation by @mudler in https://github.com/mudler/LocalAI/pull/9420
feat(distributed): sync state with frontends, better backend management reporting by @mudler in https://github.com/mudler/LocalAI/pull/9426
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9431
feat(gallery): add Wan 2.1 FLF2V 14B 720P by @mudler in https://github.com/mudler/LocalAI/pull/9440
feat(gallery): add wan i2v 720p by @mudler in https://github.com/mudler/LocalAI/pull/9457
feat: improve CLI error messages with actionable guidance by @localai-bot in https://github.com/mudler/LocalAI/pull/8880
chore(whisperx): drop ROCm/hipblas build target by @mudler in https://github.com/mudler/LocalAI/pull/9474
feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis by @mudler in https://github.com/mudler/LocalAI/pull/9480
feat(importer): expand importer flow to almost all backends by @mudler in https://github.com/mudler/LocalAI/pull/9466
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9498
feat: voice recognition by @mudler in https://github.com/mudler/LocalAI/pull/9500
feat(insightface): add antispoofing (liveness) detection by @mudler in https://github.com/mudler/LocalAI/pull/9515
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9518
feat: add biometrics UI by @mudler in https://github.com/mudler/LocalAI/pull/9524
feat: Add Sherpa ONNX backend for ASR and TTS by @richiejp in https://github.com/mudler/LocalAI/pull/8523
[intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 by @arbrick in https://github.com/mudler/LocalAI/pull/9543
feat(react-ui): editorial refresh with Nord palette and polished primitives by @mudler in https://github.com/mudler/LocalAI/pull/9550
feat: surface distributed backend management errors by @mudler in https://github.com/mudler/LocalAI/pull/9552
feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang by @mudler in https://github.com/mudler/LocalAI/pull/9553
feat(llama-cpp): expose split_mode option for multi-GPU placement by @mudler in https://github.com/mudler/LocalAI/pull/9560
ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 by @mudler in https://github.com/mudler/LocalAI/pull/9573
[intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) by @arbrick in https://github.com/mudler/LocalAI/pull/9578
feat: Log backend exit code by @richiejp in https://github.com/mudler/LocalAI/pull/9581
feat(distributed): support multiple replicas of one model on the same node by @mudler in https://github.com/mudler/LocalAI/pull/9583
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9587
feat: localai assistant chat modality by @mudler in https://github.com/mudler/LocalAI/pull/9602
chore: add golangci-lint with new-from-merge-base baseline by @richiejp in https://github.com/mudler/LocalAI/pull/9603
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9607
feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map by @richiejp in https://github.com/mudler/LocalAI/pull/9563
feat(vibevoice-cpp): add purego TTS+ASR backend by @mudler in https://github.com/mudler/LocalAI/pull/9610
feat: react chat redesign by @mudler in https://github.com/mudler/LocalAI/pull/9616
feat(llama-cpp): bump to d775992 and adapt to spec params refactor by @mudler in https://github.com/mudler/LocalAI/pull/9618
feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9629
feat(importers): whisper.cpp HF repos pick a quant + nest under whisper/models by @mudler in https://github.com/mudler/LocalAI/pull/9630
feat(branding): admin-configurable instance name, tagline, and assets by @mudler in https://github.com/mudler/LocalAI/pull/9635
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9643
feat(react-ui): add multilingual (i18n) support by @mudler in https://github.com/mudler/LocalAI/pull/9642
feat(ci): allow routing apt traffic through an alternate Ubuntu mirror by @mudler in https://github.com/mudler/LocalAI/pull/9650
feat: add LocalVQE backend and audio transformations UI by @richiejp in https://github.com/mudler/LocalAI/pull/9640
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9660
feat(concurrency-groups): per-model exclusive groups for backend loading by @mudler in https://github.com/mudler/LocalAI/pull/9662
feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp by @mudler in https://github.com/mudler/LocalAI/pull/9654
feat(vllm, distributed): tensor parallel distributed workers by @richiejp in https://github.com/mudler/LocalAI/pull/9612
feat: support word-level timestamps for faster-whisper by @eglia in https://github.com/mudler/LocalAI/pull/9621
feat(importers): add vibevoice-cpp importer for GGUF bundles by @localai-bot in https://github.com/mudler/LocalAI/pull/9685
feat(gallery): Speed up load times and clean gallery entries by @richiejp in https://github.com/mudler/LocalAI/pull/9211
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9699
feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos by @richiejp in https://github.com/mudler/LocalAI/pull/9686
feat(api/transcription): include segments + duration + language on stream done event by @localai-bot in https://github.com/mudler/LocalAI/pull/9709
feat(whisper): honor client cancellation via ggml abort_callback by @localai-bot in https://github.com/mudler/LocalAI/pull/9710
chore: Security hardening by @richiejp in https://github.com/mudler/LocalAI/pull/9719
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) by @localai-bot in https://github.com/mudler/LocalAI/pull/9726
feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9723
ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization by @localai-bot in https://github.com/mudler/LocalAI/pull/9727
ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) by @localai-bot in https://github.com/mudler/LocalAI/pull/9730
ci: consolidate llama-cpp-darwin into the matrix-driven Darwin flow by @mudler in https://github.com/mudler/LocalAI/pull/9731
feat(whisper-cpp): implement streaming transcription by @localai-bot in https://github.com/mudler/LocalAI/pull/9751

🧠 Models

chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9399
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9400
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9425
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9436
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9464
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9481
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9491
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9505
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9555
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9558
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9611
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9615
Add tags to qwen3-vl-reranker and Qwen3-VL-Embedding to the gallery by @ER-EPR in https://github.com/mudler/LocalAI/pull/9628
chore(model gallery): add chroma1-hd diffusers model by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9646
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9653
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9681
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9703
chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9720

📖 Documentation and examples

docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in https://github.com/mudler/LocalAI/pull/9268
docs(agents): capture vllm backend lessons + runtime lib packaging by @mudler in https://github.com/mudler/LocalAI/pull/9333
chore(agents): Update the backend creation instructions to include Rust and extra tests by @richiejp in https://github.com/mudler/LocalAI/pull/9490

👒 Dependencies

chore: :arrow_up: Update ggml-org/llama.cpp to 66c4f9ded01b29d9120255be1ed8d5835bcbb51d by @localai-bot in https://github.com/mudler/LocalAI/pull/9269
chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' by @mudler in https://github.com/mudler/LocalAI/pull/9282
chore: :arrow_up: Update leejet/stable-diffusion.cpp to e8323cabb0e4511ba18a50b1cb34cf1f87fc71ef by @localai-bot in https://github.com/mudler/LocalAI/pull/9281
chore: :arrow_up: Update ggml-org/llama.cpp to d132f22fc92f36848f7ccf2fc9987cd0b0120825 by @localai-bot in https://github.com/mudler/LocalAI/pull/9302
chore: :arrow_up: Update PABannier/sam3.cpp to 01832ef85fcc8eb6488f1d01cd247f07e96ff5a9 by @localai-bot in https://github.com/mudler/LocalAI/pull/9311
chore: :arrow_up: Update ggml-org/llama.cpp to e62fa13c2497b2cd1958cb496e9489e86bbd5182 by @localai-bot in https://github.com/mudler/LocalAI/pull/9312
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9321
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 6b675a5ede9b0edf0a0f44191e8b79d7ef27615a by @localai-bot in https://github.com/mudler/LocalAI/pull/9320
chore: :arrow_up: Update ggml-org/llama.cpp to ff5ef8278615a2462b79b50abdf3cc95cfb31c6f by @localai-bot in https://github.com/mudler/LocalAI/pull/9319
chore: :arrow_up: Update ggml-org/llama.cpp to 1e9d771e2c2f1113a5ebdd0dc15bafe57dce64be by @localai-bot in https://github.com/mudler/LocalAI/pull/9330
chore(deps): bump softprops/action-gh-release from 2 to 3 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9336
chore(deps): bump actions/upload-pages-artifact from 4 to 5 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9337
chore(deps): bump github.com/testcontainers/testcontainers-go from 0.41.0 to 0.42.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9338
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9346
chore(deps): bump sentence-transformers from 5.2.3 to 5.4.0 in /backend/python/transformers by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9342
chore: :arrow_up: Update ggml-org/llama.cpp to e97492369888f5311e4d1f3beb325a36bbed70e9 by @localai-bot in https://github.com/mudler/LocalAI/pull/9347
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 55d3c05bf7b377deaa5dc84d255d9740a345a206 by @localai-bot in https://github.com/mudler/LocalAI/pull/9348
chore(deps): bump github.com/google/go-containerregistry from 0.21.3 to 0.21.5 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9343
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats from 0.41.0 to 0.42.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9341
chore(deps): bump github.com/swaggo/echo-swagger from 1.4.1 to 1.5.2 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9344
chore(deps): bump github.com/charmbracelet/glamour from 0.10.0 to 1.0.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9340
chore: :arrow_up: Update ggml-org/llama.cpp to fae3a28070fe4026f87bd6a544aba1b2d1896566 by @localai-bot in https://github.com/mudler/LocalAI/pull/9357
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9358
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9369
chore: :arrow_up: Update ggml-org/llama.cpp to b3d758750a268bf93f084ccfa3060fb9a203192a by @localai-bot in https://github.com/mudler/LocalAI/pull/9370
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 1163af96cf6bb4a4b819f998f84c153a49768b99 by @localai-bot in https://github.com/mudler/LocalAI/pull/9368
chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9373
chore: :arrow_up: Update leejet/stable-diffusion.cpp to c41c5ded7af85e01b7fe442ff7950c720706d53a by @localai-bot in https://github.com/mudler/LocalAI/pull/9366
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9384
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to eaf83865a132f66e8f49efe0e78491625942f068 by @localai-bot in https://github.com/mudler/LocalAI/pull/9382
chore: :arrow_up: Update leejet/stable-diffusion.cpp to a564fdf642780d1df123f1c413b19961375b8346 by @localai-bot in https://github.com/mudler/LocalAI/pull/9383
chore: :arrow_up: Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' by @mudler in https://github.com/mudler/LocalAI/pull/9385
chore: :arrow_up: Update ggml-org/llama.cpp to 4fbdabdc61c04d1262b581e1b8c0c3b119f688ff by @localai-bot in https://github.com/mudler/LocalAI/pull/9381
chore: bump inference defaults from unsloth by @github-actions[bot] in https://github.com/mudler/LocalAI/pull/9396
chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9376
chore: :arrow_up: Update ggml-org/whisper.cpp to 166c20b473d5f4d04052e699f992f625ea2a2fdd by @localai-bot in https://github.com/mudler/LocalAI/pull/9403
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 52efa12fdae390d1dca6ecd7ca00010fe51f651e by @localai-bot in https://github.com/mudler/LocalAI/pull/9404
chore: :arrow_up: Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb by @localai-bot in https://github.com/mudler/LocalAI/pull/9415
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d by @localai-bot in https://github.com/mudler/LocalAI/pull/9417
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe by @localai-bot in https://github.com/mudler/LocalAI/pull/9418
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 by @localai-bot in https://github.com/mudler/LocalAI/pull/9428
chore: :arrow_up: Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad by @localai-bot in https://github.com/mudler/LocalAI/pull/9429
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 by @localai-bot in https://github.com/mudler/LocalAI/pull/9430
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e by @localai-bot in https://github.com/mudler/LocalAI/pull/9447
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9451
chore: :arrow_up: Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe by @localai-bot in https://github.com/mudler/LocalAI/pull/9450
chore: :arrow_up: Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 by @localai-bot in https://github.com/mudler/LocalAI/pull/9448
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9452
chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9453
chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9454
chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9456
chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9455
chore: :arrow_up: Update ggml-org/llama.cpp to 5a4cd6741fc33227cdacb329f355ab21f8481de2 by @localai-bot in https://github.com/mudler/LocalAI/pull/9479
chore: :arrow_up: Update leejet/stable-diffusion.cpp to c97702e1057c2fe13a7074cd9069cb9dd6edc1bf by @localai-bot in https://github.com/mudler/LocalAI/pull/9495
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9522
chore: :arrow_up: Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 by @localai-bot in https://github.com/mudler/LocalAI/pull/9520
chore: :arrow_up: Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a by @localai-bot in https://github.com/mudler/LocalAI/pull/9521
chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9544
chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9546
chore: :arrow_up: Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 by @localai-bot in https://github.com/mudler/LocalAI/pull/9548
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 by @localai-bot in https://github.com/mudler/LocalAI/pull/9549
chore: :arrow_up: Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d by @localai-bot in https://github.com/mudler/LocalAI/pull/9497
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 by @localai-bot in https://github.com/mudler/LocalAI/pull/9570
chore: :arrow_up: Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a by @localai-bot in https://github.com/mudler/LocalAI/pull/9571
chore: :arrow_up: Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 by @localai-bot in https://github.com/mudler/LocalAI/pull/9572
chore: :arrow_up: Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 by @localai-bot in https://github.com/mudler/LocalAI/pull/9577
chore: :arrow_up: Update ggml-org/llama.cpp to 665abc609740d397d30c0d8ef4157dbf900bd1a3 by @localai-bot in https://github.com/mudler/LocalAI/pull/9584
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to d6f3e4e28fbf75e6181e6ea32e734de9ce9304fd by @localai-bot in https://github.com/mudler/LocalAI/pull/9585
chore: :arrow_up: Update leejet/stable-diffusion.cpp to a81677f59c92d90343aebca51dfed7decf0a0cb0 by @localai-bot in https://github.com/mudler/LocalAI/pull/9586
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/postgres from 0.41.0 to 0.42.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9591
chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.1 to 2.28.2 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9593
chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9594
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 453a027c17e4d63a7f16b871197a396240a65138 by @localai-bot in https://github.com/mudler/LocalAI/pull/9608
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 3d6064b37ef4607917f8acf2ca8c8906d5087413 by @localai-bot in https://github.com/mudler/LocalAI/pull/9617
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to a8aecbf15933295af96504f9a693998322185b5c by @localai-bot in https://github.com/mudler/LocalAI/pull/9625
chore: :arrow_up: Update ggml-org/llama.cpp to beb42fffa45eded44804a1fd4916146222371581 by @localai-bot in https://github.com/mudler/LocalAI/pull/9624
deps: update quic-go to v0.59.0 (fix session ticket panic) by @egyptianbman in https://github.com/mudler/LocalAI/pull/9655
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9661
chore: :arrow_up: Update vllm-project/vllm cu130 wheel to 0.20.1 by @localai-bot in https://github.com/mudler/LocalAI/pull/9649
chore(deps): bump docs/themes/hugo-theme-relearn from f69a085 to 8bb66fa by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9665
chore: :arrow_up: Update ggml-org/llama.cpp to eff06702b2a52e1020ea009ebd86cb9f5acabab5 by @localai-bot in https://github.com/mudler/LocalAI/pull/9637
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 45dfd80371785731bc2ed05a76252497a4e7a282 by @localai-bot in https://github.com/mudler/LocalAI/pull/9644
chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9663
chore: :arrow_up: Update ggml-org/llama.cpp to bbeb89d76c41bc250f16e4a6fefcc9b530d6e3f3 by @localai-bot in https://github.com/mudler/LocalAI/pull/9676
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 8b56d813a9ed04fa7b7fe2588fddd845cf64eccb by @localai-bot in https://github.com/mudler/LocalAI/pull/9677
chore: :arrow_up: Update TheTom/llama-cpp-turboquant to 69d8e4be47243e83b3d0d71e932bc7aa61c644dc by @localai-bot in https://github.com/mudler/LocalAI/pull/9638
chore: :arrow_up: Update ggml-org/whisper.cpp to 4bf733672b2871d4153158af4f621a6dd9104f4a by @localai-bot in https://github.com/mudler/LocalAI/pull/9636
chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9700
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to b93721902b4662f9b973b1c412006081c958d085 by @localai-bot in https://github.com/mudler/LocalAI/pull/9697
chore: :arrow_up: Update ggml-org/llama.cpp to 2496f9c14965c39589f53eea31bdb6d762b1d360 by @localai-bot in https://github.com/mudler/LocalAI/pull/9698
chore: :arrow_up: Update leejet/stable-diffusion.cpp to 90e87bc846f17059771efb8aaa31e9ef0cab6f78 by @localai-bot in https://github.com/mudler/LocalAI/pull/9701
chore(deps): bump openssl from 0.10.76 to 0.10.79 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9694
chore(deps): bump the go_modules group across 1 directory with 8 updates by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9705
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 9a26522af234f8db079ae3735f35ab6c20fe2c66 by @localai-bot in https://github.com/mudler/LocalAI/pull/9713
chore: :arrow_up: Update ggml-org/llama.cpp to 05ff59cb57860cc992fc6dcede32c696efea711c by @localai-bot in https://github.com/mudler/LocalAI/pull/9714
chore: :arrow_up: Update ggml-org/whisper.cpp to c81b2dabbc45484dee2ca6658cfe39c841df5c70 by @localai-bot in https://github.com/mudler/LocalAI/pull/9712
chore(deps): bump LocalAGI for collection rehydrate-on-init-failure fix by @localai-bot in https://github.com/mudler/LocalAI/pull/9721
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 98950267c67fd95937a54ebd6e3c66cf2679b710 by @localai-bot in https://github.com/mudler/LocalAI/pull/9725
chore: :arrow_up: Update ggml-org/llama.cpp to 9f5f0e689c9e977e5f23a27e344aa36082f44738 by @localai-bot in https://github.com/mudler/LocalAI/pull/9724
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to ab0f22b819ac57b7e7484f69c00c10fc755d5c6c by @localai-bot in https://github.com/mudler/LocalAI/pull/9734
chore: :arrow_up: Update ggml-org/llama.cpp to 00d56b11c3477b99bc18562dc1d1834f0d961778 by @localai-bot in https://github.com/mudler/LocalAI/pull/9733

Other Changes

ci: add pre-built base-grpc-builder image infrastructure (PR 1/2) by @localai-bot in https://github.com/mudler/LocalAI/pull/9737
ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) by @localai-bot in https://github.com/mudler/LocalAI/pull/9738
chore: :arrow_up: Update ggml-org/llama.cpp to 1e5ad35d560b90a8ac447d149c8f8447ae1fcaa0 by @localai-bot in https://github.com/mudler/LocalAI/pull/9739
docs(agents): update CI caching docs after the GHA-free-tier migration by @localai-bot in https://github.com/mudler/LocalAI/pull/9742
ci: split backend-jobs into single-arch and multi-arch matrices by @localai-bot in https://github.com/mudler/LocalAI/pull/9746
chore: :arrow_up: Update ggml-org/llama.cpp to 2b2babd1243c67ca811c0a5852cedf92b1a20024 by @localai-bot in https://github.com/mudler/LocalAI/pull/9747
chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 23127139cb6fa314899c3b5f4935b88b3374c56c by @localai-bot in https://github.com/mudler/LocalAI/pull/9748
chore: :arrow_up: Update ggml-org/whisper.cpp to c33c5618b72bb345df029b730b36bc0e369845a3 by @localai-bot in https://github.com/mudler/LocalAI/pull/9749
chore: :arrow_up: Update vllm-project/vllm cu130 wheel to 0.20.2 by @localai-bot in https://github.com/mudler/LocalAI/pull/9750
chore: :arrow_up: Update ggml-org/llama.cpp to 389ff61d77b5c71cec0cf92fe4e5d01ace80b797 by @localai-bot in https://github.com/mudler/LocalAI/pull/9752

New Contributors

@neurocis made their first contribution in https://github.com/mudler/LocalAI/pull/9304
@thelittlefireman made their first contribution in https://github.com/mudler/LocalAI/pull/9264
@mvanhorn made their first contribution in https://github.com/mudler/LocalAI/pull/9379
@keithmattix made their first contribution in https://github.com/mudler/LocalAI/pull/9410
@SAY-5 made their first contribution in https://github.com/mudler/LocalAI/pull/9438
@pjbrzozowski made their first contribution in https://github.com/mudler/LocalAI/pull/9427
@russell made their first contribution in https://github.com/mudler/LocalAI/pull/9446
@leinasi2014 made their first contribution in https://github.com/mudler/LocalAI/pull/9443
@sec171 made their first contribution in https://github.com/mudler/LocalAI/pull/9461
@Dennisadira made their first contribution in https://github.com/mudler/LocalAI/pull/9411
@orbisai0security made their first contribution in https://github.com/mudler/LocalAI/pull/9486
@Anai-Guo made their first contribution in https://github.com/mudler/LocalAI/pull/9526
@arbrick made their first contribution in https://github.com/mudler/LocalAI/pull/9543
@eglia made their first contribution in https://github.com/mudler/LocalAI/pull/9541
@egyptianbman made their first contribution in https://github.com/mudler/LocalAI/pull/9655
@arteven made their first contribution in https://github.com/mudler/LocalAI/pull/9674

Full Changelog: https://github.com/mudler/LocalAI/compare/v4.1.3...v4.2.0

Security Fixes

grpc-server hardening – removed unsafe sprintf() in C++ grpc server
OIDC library bump (go‑oidc/v3) from 3.17.0 → 3.18.0
Settings API now strips env‑supplied ApiKeys before persisting

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track LocalAI

Get notified when new releases ship.

About LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

LocalAI

ReleasePort's take

Summary

Changes in this release

🎉 LocalAI 4.2.0 Release! 🚀

📌 TL;DR

🚀 New Features & Major Enhancements

🎙️ Voice Recognition

👤 Face Recognition & Antispoofing

🎬 Diarization & a smarter audio pipeline

🦙 Ollama drop-in API

🎬 Video Generation

🎨 React UI: total refresh

🔄 Backend & model lifecycle

🧪 New Backends!

⚡ vLLM at parity (and beyond)

🛰️ Distributed Mode v2

🔐 Auth & Security

🖥️ Hardware & deployment

🛠️ Under the Hood

🐞 Notable fixes

🆕 Gallery additions

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Bug fixes :bug:

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

Security Fixes

Related context

Related tools

Featured in