Skip to content

LocalAI

v4.2.0 Security

This release includes 3 security fixes for security teams reviewing exposed deployments.

Published 23d Model Serving & MLOps
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →
This release patches 3 known CVEs

Topics

agents ai api audio-generation decentralized distributed
+12 more
image-generation libp2p llama llm mamba mcp musicgen object-detection rerank stable-diffusion text-generation tts

ReleasePort's take

Moderate signal
editorial:auto 13d

LocalAI v4.2.0 introduces voice & face biometrics, audio diarization, Ollama‑compatible API, video generation from stable-diffusion.ggml, and a redesigned multilingual UI with brandable settings.

Why it matters: Patch immediately to remove the unsafe sprintf() in grpc-server.cpp that could cause buffer overflow; this security fix is critical for all deployments using the gRPC server surface.

Summary

AI summary

LocalAI adds voice & face biometrics, audio diarization, Ollama drop‑in API, video generation and a redesigned multilingual UI with brandable settings.

Changes in this release

Security Medium

Removed unsafe sprintf() in grpc-server.cpp preventing buffer overflow.

Removed unsafe sprintf() in grpc-server.cpp preventing buffer overflow.

Source: llm_adapter@2026-05-21

Confidence: high

Security Medium

Settings API strips env-supplied ApiKeys before persisting to prevent leaks.

Settings API strips env-supplied ApiKeys before persisting to prevent leaks.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Voice recognition pipeline with /v1/voice/* endpoints for speaker verification and identification.

Voice recognition pipeline with /v1/voice/* endpoints for speaker verification and identification.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Face recognition supports 1:1 verify, 1:N identify, detection, analysis, and antispoofing.

Face recognition supports 1:1 verify, 1:N identify, detection, analysis, and antispoofing.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

/v1/audio/diarization endpoint segments speech by speaker turn.

/v1/audio/diarization endpoint segments speech by speaker turn.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

faster-whisper generates word-level timestamps in transcriptions.

faster-whisper generates word-level timestamps in transcriptions.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Whisper transcription supports client cancellation via GGML abort callback.

Whisper transcription supports client cancellation via GGML abort callback.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Ollama API drop-in compatibility allows existing clients to connect to LocalAI.

Ollama API drop-in compatibility allows existing clients to connect to LocalAI.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

stable-diffusion.ggml backend generates video with image-to-video and first-last-frame modes.

stable-diffusion.ggml backend generates video with image-to-video and first-last-frame modes.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

React chat UI redesigned with Nord palette, cleaner layout, better message density.

React chat UI redesigned with Nord palette, cleaner layout, better message density.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

React UI supports multilingual interface in 5 languages via i18n.

React UI supports multilingual interface in 5 languages via i18n.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Admin panel allows customizable instance name, tagline, logo, and favicon.

Admin panel allows customizable instance name, tagline, logo, and favicon.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Interactive model config editor with autocomplete and live validation in UI.

Interactive model config editor with autocomplete and live validation in UI.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Backend versioning with automatic upgrade detection and auto-upgrade mechanism.

Backend versioning with automatic upgrade detection and auto-upgrade mechanism.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Models can be pinned to survive garbage collection reaper.

Models can be pinned to survive garbage collection reaper.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Per-model exclusive concurrency groups prevent heavy backends from resource contention.

Per-model exclusive concurrency groups prevent heavy backends from resource contention.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Universal importer expands across most backends with multi-shard GGUF support.

Universal importer expands across most backends with multi-shard GGUF support.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

11 new backends: sglang, ik-llama.cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, and others.

11 new backends: sglang, ik-llama.cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, and others.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

vLLM achieves feature parity with llama.cpp backend.

vLLM achieves feature parity with llama.cpp backend.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

vLLM exposes full AsyncEngineArgs via generic YAML engine_args map.

vLLM exposes full AsyncEngineArgs via generic YAML engine_args map.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Tensor-parallel distributed workers enable single model across multiple nodes.

Tensor-parallel distributed workers enable single model across multiple nodes.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

CUDA 13 builds available for vLLM, vLLM-omni, and sglang.

CUDA 13 builds available for vLLM, vLLM-omni, and sglang.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Distributed mode v2 adds orchestrator resilience and round-robin replica balancing.

Distributed mode v2 adds orchestrator resilience and round-robin replica balancing.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

NATS backend upgrade split from install for cleaner distributed management.

NATS backend upgrade split from install for cleaner distributed management.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Transcription stream-done event includes segments, duration, and detected language.

Transcription stream-done event includes segments, duration, and detected language.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

LocalVQE backend enables audio effects exploration in React UI.

LocalVQE backend enables audio effects exploration in React UI.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

GPU support for AMD Strix Halo / Ryzen AI MAX (gfx1151).

GPU support for AMD Strix Halo / Ryzen AI MAX (gfx1151).

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

NVIDIA L4T arm64 CUDA 13 support for Jetson-class boards.

NVIDIA L4T arm64 CUDA 13 support for Jetson-class boards.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

ROCm bumped to 7.x with latest driver support.

ROCm bumped to 7.x with latest driver support.

Source: llm_adapter@2026-05-21

Confidence: low

Bugfix Medium

PostgreSQL cascades user deletion across all owned data.

PostgreSQL cascades user deletion across all owned data.

Source: llm_adapter@2026-05-21

Confidence: high

Other Medium

Version 4.2.0 released with numerous features, bug fixes, security improvements, and new contributors.

Version 4.2.0 released with numerous features, bug fixes, security improvements, and new contributors.

Source: granite4.1:30b@2026-05-24-audit

Confidence: low

Full changelog

🎉 LocalAI 4.2.0 Release! 🚀




LocalAI 4.2.0 is out!

This release teaches LocalAI to see and hear. New /v1/voice/* and /v1/audio/diarization endpoints, a full face-recognition pipeline with antispoofing, word-level timestamps for faster-whisper, and a client-cancellable Whisper. There is also a drop-in Ollama API, video generation in stable-diffusion.ggml, a redesigned chat with i18n and admin-configurable branding, eleven new backends, an interactive model config editor with autocomplete, and a hardened distributed mode v2. vLLM finally hits feature parity with llama.cpp and gets tensor-parallel distributed workers.


📌 TL;DR

| Feature | Summary |
|---------|---------|
| 🎙️ Voice Recognition | New /v1/voice/*. Verify, identify, embed and analyze speakers. |
| 👤 Face Recognition + Liveness | 1:1 verify, 1:N identify, detect, analyze, embed, and reject spoofed photos. |
| 🎬 Diarization | New /v1/audio/diarization endpoint, "who spoke when?" via sherpa-onnx + vibevoice.cpp. |
| 🗣️ Better Transcriptions | Word-level timestamps, client-cancellable Whisper, segments + duration + language on the stream-done event. |
| 🦙 Ollama API | Drop-in compatibility. Point your ollama client straight at LocalAI. |
| 🎬 Video Generation | stable-diffusion.ggml now generates video (i2v, first-last-frame). |
| 💬 Redesigned UI | Chat redesign, Nord palette, i18n (5 languages), admin-configurable branding. |
| ✏️ Interactive Model Editor | Autocomplete-driven config editor in the UI. |
| 📦 Universal Importer | Imports across most backends, not just llama.cpp. |
| 🚦 Concurrency Groups | Per-model exclusive groups for safe backend loading. |
| 🧪 11 New Backends | sglang, ik-llama-cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, tinygrad-multimodal, LocalVQE, vibevoice-cpp, insightface (liveness), voice-rec. |
| ⚡ vLLM @ parity | Feature parity with llama.cpp + tensor-parallel distributed workers + full engine_args. |
| 🛰️ Distributed v2 | Hardened orchestrator, round-robin replicas, scoped Upgrade All, NATS install/upgrade split. |


🚀 New Features & Major Enhancements

🎙️ Voice Recognition

LocalAI is now ears-on. New /v1/voice/* endpoints let you verify, identify, analyze and embed speakers, powered by a SpeechBrain + ONNX Python backend.

  • 1:1 Verify, "is this the same speaker?"
  • 1:N Identify, "who is talking, out of my enrolled users?"
  • Embeddings, voice fingerprints for your own pipelines
  • Analyze, age, gender, emotion attributes per segment

🔥 Pairs naturally with the new diarization endpoint for full speaker pipelines.

https://github.com/user-attachments/assets/3777decd-d82b-42f5-a4e1-43f2da44e6c8


👤 Face Recognition & Antispoofing

A complete face-biometrics pipeline, built on InsightFace + ONNX.

  • 1:1 Verify, match two faces
  • 1:N Identify, resolve a face against an enrolled set
  • Detection & Analysis, find faces, extract attributes (age, gender, emotion, race)
  • Embeddings, facial fingerprints for your own stack
  • 🆕 Antispoofing (liveness), reject spoofed photos and videos

✅ Samples never leave your machine. They go only to the running backend.

https://github.com/user-attachments/assets/37c1271e-b1e3-4b5d-a1b4-f8d870051da3


🎬 Diarization & a smarter audio pipeline

Audio is a first-class citizen now.

  • /v1/audio/diarization, segments speech by speaker turn (sherpa-onnx + vibevoice.cpp)
  • Word-level timestamps for faster-whisper
  • Client cancellation for Whisper via the ggml abort_callback. Stop a transcription mid-flight and free the GPU.
  • Stream-done metadata on /v1/audio/transcriptions. segments, duration and language on the final event.
  • Audio transformations UI (LocalVQE), explore audio FX directly from the React UI
  • Transcription error visibility, handler errors land in the access log and on the client

🦙 Ollama drop-in API

Point your existing Ollama client at LocalAI. Everything keeps working. Another front door, same engine.

OLLAMA_HOST=http://localhost:8080 ollama run qwen3

🎬 Video Generation

The stable-diffusion.ggml backend now generates video, with curated gallery entries for Wan 2.1 FLF2V 14B 720P and Wan i2v 720p, plus a new stablediffusion-ggml-development meta backend to track the cutting edge.


🎨 React UI: total refresh

A massive UI cycle landed in 4.2:

  • 💬 Chat redesign, cleaner layout, faster perceived latency, better message density
  • 🎨 Editorial refresh with the Nord palette, calmer, more focused, dark-mode-first
  • 🌍 Multilingual / i18n, English, Italiano, Español, Deutsch, 简体中文
  • 🪪 Brandable instance, admin-configurable name, tagline, and assets (logo, favicon)
  • ✏️ Interactive model config editor, autocomplete over known fields, live validation, automatic file-renaming on save
  • 🧰 Backend management UX, revamped backend list with concrete versions
  • 🛟 Better error UX, distributed backend management errors surface cleanly

💡 Self-host with your branding. The login page, sidebar, footer, and browser tab all pick up the instance name and logo.

https://github.com/user-attachments/assets/91a7a8c8-15e8-4bd7-b97b-64fe0466bbd7

https://github.com/user-attachments/assets/369e0dc7-87ba-4303-8193-24eda03fdb1f


🔄 Backend & model lifecycle

  • Backend versioning with automatic upgrade detection
  • Pin models so they survive the reaper
  • On-demand toggle per model to control auto-load
  • Concurrency groups, per-model exclusive groups so heavy backends won't trample each other
  • Universal importer, single flow that imports across most backends, with clean multi-shard GGUF handling and dedicated importers for vibevoice-cpp and whisper.cpp HF repos

https://github.com/user-attachments/assets/3d3be7ea-2601-4284-9a89-358ae99a926e

https://github.com/user-attachments/assets/f13c5ca9-f174-48c0-9aee-e3406d50e607


🧪 New Backends!

| Backend | What it brings |
|---|---|
| sglang | High-throughput LLM serving + speculative decoding (EAGLE/EAGLE3/DFLASH/MTP) |
| ik-llama.cpp | ikawrakow's llama.cpp fork |
| TurboQuant | Quant-focused llama.cpp fork |
| sam.cpp | Segment Anything detection |
| Kokoros | Rust-native Kokoro TTS |
| qwen3tts.cpp | Qwen3 TTS |
| tinygrad-multimodal (experimental) | tinygrad-powered multimodal |
| vibevoice.cpp | Diarization-grade speech |
| LocalVQE | Audio transformations / FX |
| insightface | Face antispoofing |
| voice-rec | Speaker recognition / embeddings |


⚡ vLLM at parity (and beyond)

  • vLLM parity with llama.cpp, same feature surface, same ergonomics
  • vLLM engine_args, the full AsyncEngineArgs exposed via a generic YAML map
  • Tensor-parallel distributed workers, fan a single model across nodes
  • CUDA 13 builds for vLLM, vLLM-omni and sglang
  • L4T arm64 (CUDA 13), vLLM/vLLM-omni/sglang variants for Jetson-class arm64
  • MLX backend refactored, shared helpers and enhanced functionality
  • llama.cpp split_mode for explicit multi-GPU placement
  • Speculative decoding wired through for llama.cpp, Gemma 4 thinking support added
  • Vision / mtmd marker propagated from the backend via ModelMetadata

🛰️ Distributed Mode v2

Distributed mode keeps maturing. This release was a hardening pass across the orchestration loop:

  • Orchestrator resilience, auto-upgrade routing, worker bind-wait, RAG-init crash, log-spam fixes
  • Round-robin across replicas of the same model
  • Upgrade All scoped to nodes that actually have the backend installed
  • NATS install / upgrade split, backend.upgrade no longer piggybacks on install
  • Cached-replica lookup honors NodeSelector, the reconciler no longer scales up empty backends
  • VRAM/RAM reporting correct on NVIDIA unified-memory hosts
  • Agent nodes, queue loops stop on teardown, dead-letter cap added
  • Autoscaling, load-model extracted from Route() and applied during autoscale

🔐 Auth & Security

  • Settings API, env-supplied ApiKeys are stripped before persisting (no accidental leaks)
  • grpc-server hardening, removed unsafe sprintf() in the C++ grpc server
  • OIDC, bumped go-oidc/v3 to 3.18.0
  • Security hardening pass across the codebase
  • AI coding assistants policy, LocalAI now follows the Linux kernel's DCO/attribution guidelines (Assisted-by: trailer, no AI co-authors)

🖥️ Hardware & deployment

  • CUDA 13 for vLLM, vLLM-omni, and sglang
  • NVIDIA L4T arm64 (CUDA 13) for Jetson-class boards
  • ROCm 7.x bumped to latest
  • gfx1151 (Strix Halo / Ryzen AI MAX) support, AMDGPU_TARGETS exposed as a build-arg
  • Intel GPU, latest oneapi-basekit (b70 support) across Intel images
  • arm64 CI, cpu-whisperx and cpu-faster-whisper now ship arm64 images
  • whisperx, ROCm/HIPBLAS target dropped (pinned to rocm6.4 wheels)

🛠️ Under the Hood

  • Better CLI errors with actionable guidance
  • golangci-lint baseline (new-from-merge-base) keeps drift in check
  • Coding-agent discoverability, new APIs let coding agents introspect and configure LocalAI
  • Autoparser, prefers backend-emitted chat deltas, correct logprob passthrough, strips partial reasoning tags during warm-up
  • Reasoning + tools, no more empty content from thinking models in retry loops
  • Streaming hygiene, deduped content, deduped tool calls, recovered reasoning, unique tool_call IDs in deferred flushes
  • HTTP, handler-error status now visible in the access log + transcription error surface
  • Backend monitor accepts model as a query parameter
  • Config loader, YAML backup files are ignored
  • GGUF thinking probe respects explicit reasoning config
  • Inference defaults refreshed from Unsloth
  • Embeddings on collection upload, dim changes handled gracefully
  • Python backends, JIT subprocesses use tempfile.gettempdir() instead of hardcoded /tmp
  • Draft model paths, relative paths now resolve against the models dir
  • whisper-cpp: implement streaming transcription and context cancellation

🐞 Notable fixes

  • Cascading user deletion on PostgreSQL, deleting a user removes all owned data
  • Importer emits all shards for multi-part GGUF models
  • Open Responses parses OpenAI-spec nested tool_choice and uses the correct setter
  • llama-cpp: server-chat.cpp included in grpc-server TU, common -> llama-common rename, turboquant common.h detection
  • ik-llama-cpp: adapted to common_grammar in sampling.h, patched clip.cpp for the new ggml_quantize_chunk signature
  • Kokoros: trait stubs (face_verify, face_analyze, audio_transcription_stream), CI publish
  • stable-diffusion.ggml: MP4 container forced in ffmpeg mux, new i2v options
  • Gallery: orphaned meta-backend uninstall, gemma-4 URIs, flux-kontext param overrides, Wan dedup, z-image-turbo load, Qwen3.5 typo override, tag-casing normalization
  • Streaming: content + tool-call dedup, reasoning recovery, unique tool-call IDs in deferred flush
  • Realtime: consume ChatDeltas when the C++ autoparser clears Response
  • Tool-calls: use SetFunctionCallNameString when forcing a specific tool
  • Faster-whisper: cast segment timestamps to int after multiplication
  • mlx-vlm: pinned to v0.4.4 to unblock CUDA builds
  • vLLM: dropped flash-attn wheel to avoid torch 2.10 ABI mismatch
  • Downloader: list supported URL schemes in DownloadFile errors
  • Backend: resolve relative draft_model paths against the models dir
  • CI: wire AMDGPU_TARGETS through the backend workflow, switch gallery-agent to sigs.k8s.io/yaml, recover rerankers + vllm-omni on aarch64, unbreak master CI for docs/kokoros/vibevoice-cpp ABI

🆕 Gallery additions

  • Wan 2.1 FLF2V 14B 720P (video)
  • Wan i2v 720p (image-to-video)
  • stablediffusion-ggml-development meta backend
  • chroma1-hd (diffusers)
  • Gemma 4 (+ mmproj)
  • EmbeddingGemma
  • Qwen 3.5, Qwen-ASR, OCR entries for llama.cpp
  • Qwen3-VL Reranker, Qwen3-VL Embedding (tagged)
  • A steady stream of automated gallery-agent model additions throughout the cycle 🤖

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Drop-in REST API compatible with OpenAI specs for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

Local AI agent management platform. Drop-in for OpenAI's Responses API, with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall

RESTful API and knowledge-base management providing persistent memory and storage for AI agents. Pairs with LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall


❤️ Thank You

LocalAI is a true FOSS movement, built by contributors, powered by community.

If you believe in privacy-first, self-hosted AI:

  • Star the repo
  • 💬 Contribute code, docs, translations or feedback
  • 📣 Share with others

Your support keeps this stack alive.


✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes :bug:

  • fix(autoscaling): extract load model from Route() and use as well when doing autoscale by @mudler in https://github.com/mudler/LocalAI/pull/9270
  • fix(nodes): better detection if nodes goes down or model is not available by @mudler in https://github.com/mudler/LocalAI/pull/9274
  • fix: try to add whisperx and faster-whisper for more variants by @mudler in https://github.com/mudler/LocalAI/pull/9278
  • fix: thinking models with tools returning empty content (reasoning-only retry loop) by @mudler in https://github.com/mudler/LocalAI/pull/9290
  • fix(streaming): deduplicate tool call emissions during streaming by @mudler in https://github.com/mudler/LocalAI/pull/9292
  • fix(streaming): skip chat deltas for role-init elements to prevent first token duplication by @mudler in https://github.com/mudler/LocalAI/pull/9299
  • Fix load of z-image-turbo by @thelittlefireman in https://github.com/mudler/LocalAI/pull/9264
  • fix(agents): handle embedding model dim changes on collection upload by @mudler in https://github.com/mudler/LocalAI/pull/9365
  • fix(gallery): correct gemma-4 model URIs returning 404 by @mvanhorn in https://github.com/mudler/LocalAI/pull/9379
  • fix(ui): rename model config files on save to prevent duplicates by @mudler in https://github.com/mudler/LocalAI/pull/9388
  • fix(ci): switch gallery-agent to sigs.k8s.io/yaml by @mudler in https://github.com/mudler/LocalAI/pull/9397
  • fix(llama-cpp): rename linked target common -> llama-common by @mudler in https://github.com/mudler/LocalAI/pull/9408
  • fix(vision): propagate mtmd media marker from backend via ModelMetadata by @mudler in https://github.com/mudler/LocalAI/pull/9412
  • fix(turboquant): resolve common.h by detecting llama-common vs common target by @mudler in https://github.com/mudler/LocalAI/pull/9413
  • fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg by @keithmattix in https://github.com/mudler/LocalAI/pull/9410
  • fix(kokoros): implement audio_transcription_stream trait stub by @mudler in https://github.com/mudler/LocalAI/pull/9422
  • fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc by @mudler in https://github.com/mudler/LocalAI/pull/9423
  • fix(distributed): stop queue loops on agent nodes + dead-letter cap by @mudler in https://github.com/mudler/LocalAI/pull/9433
  • fix(gallery): allow uninstalling orphaned meta backends + force reinstall by @mudler in https://github.com/mudler/LocalAI/pull/9434
  • fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux by @mudler in https://github.com/mudler/LocalAI/pull/9435
  • fix(settings): strip env-supplied ApiKeys from the request before persisting by @SAY-5 in https://github.com/mudler/LocalAI/pull/9438
  • fix(api): remove duplicate /api/traces endpoint that broke React UI by @pjbrzozowski in https://github.com/mudler/LocalAI/pull/9427
  • fix(distributed): pass ExternalURI through NATS backend install by @russell in https://github.com/mudler/LocalAI/pull/9446
  • fix(ci): wire AMDGPU_TARGETS through backend build workflow by @russell in https://github.com/mudler/LocalAI/pull/9445
  • fix(config): ignore yaml backup files in model loader by @leinasi2014 in https://github.com/mudler/LocalAI/pull/9443
  • [gallery] Fix duplicate sha256 keys in Wan models by @sec171 in https://github.com/mudler/LocalAI/pull/9461
  • fix(tests): update InstallBackend call sites for new URI/Name/Alias params by @mudler in https://github.com/mudler/LocalAI/pull/9467
  • Fix: Add model parameter to neutts-air gallery definition by @localai-bot in https://github.com/mudler/LocalAI/pull/8793
  • fix(gallery-agent): process blacklist command on recently-closed PRs by @mudler in https://github.com/mudler/LocalAI/pull/9473
  • Respect explicit reasoning config during GGUF thinking probe by @leinasi2014 in https://github.com/mudler/LocalAI/pull/9463
  • fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush by @mudler in https://github.com/mudler/LocalAI/pull/9470
  • fix(backend-monitor): accept model as a query parameter by @Dennisadira in https://github.com/mudler/LocalAI/pull/9411
  • fix(kokoros): Build and publish the backend images from CI/CD by @richiejp in https://github.com/mudler/LocalAI/pull/9487
  • fix: remove unsafe sprintf() in grpc-server.cpp by @orbisai0security in https://github.com/mudler/LocalAI/pull/9486
  • fix(kokoros): implement face_verify and face_analyze trait stubs by @mudler in https://github.com/mudler/LocalAI/pull/9499
  • fix(ik-llama-cpp): adapt to common_grammar struct in sampling.h by @mudler in https://github.com/mudler/LocalAI/pull/9512
  • fix(llama-cpp): include server-chat.cpp in grpc-server translation unit by @mudler in https://github.com/mudler/LocalAI/pull/9511
  • fix(importer): emit all shards for multi-part GGUF models by @mudler in https://github.com/mudler/LocalAI/pull/9513
  • fix(openresponses): parse OpenAI-spec nested tool_choice + use correct setter by @walcz-de in https://github.com/mudler/LocalAI/pull/9509
  • fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9526
  • fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature by @mudler in https://github.com/mudler/LocalAI/pull/9531
  • fix(realtime): consume ChatDeltas when C++ autoparser clears Response by @richiejp in https://github.com/mudler/LocalAI/pull/9538
  • fix: add hipblaslt library by @eglia in https://github.com/mudler/LocalAI/pull/9541
  • fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts by @mudler in https://github.com/mudler/LocalAI/pull/9545
  • fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch by @richiejp in https://github.com/mudler/LocalAI/pull/9557
  • fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds by @mudler in https://github.com/mudler/LocalAI/pull/9568
  • fix(gallery): normalize inconsistent tag casing/plurals across gallery models by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9574
  • fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9580
  • fix(diffusers): drop compel from requirements to unblock pip resolver by @mudler in https://github.com/mudler/LocalAI/pull/9632
  • fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds by @russell in https://github.com/mudler/LocalAI/pull/9626
  • fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups by @localai-bot in https://github.com/mudler/LocalAI/pull/9652
  • fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam by @localai-bot in https://github.com/mudler/LocalAI/pull/9657
  • fix(faster-whisper): cast segment timestamps to int after multiplication by @arteven in https://github.com/mudler/LocalAI/pull/9674
  • fix(python-backend): make JIT subprocesses work on hosts of any size by @richiejp in https://github.com/mudler/LocalAI/pull/9679
  • fix(distributed): scope Upgrade All to nodes that have the backend installed by @mudler in https://github.com/mudler/LocalAI/pull/9678
  • fix(backend): resolve relative draft_model paths against the models dir by @localai-bot in https://github.com/mudler/LocalAI/pull/9680
  • fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) by @localai-bot in https://github.com/mudler/LocalAI/pull/9682
  • fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 by @localai-bot in https://github.com/mudler/LocalAI/pull/9688
  • fix(distributed): round-robin replicas of the same model by @localai-bot in https://github.com/mudler/LocalAI/pull/9695
  • fix(downloader): list supported URL schemes in DownloadFile error by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9689
  • fix(auth): cascade user deletion across all owned data on PostgreSQL by @localai-bot in https://github.com/mudler/LocalAI/pull/9702
  • fix(http): make handler-error status visible in access log + transcription errors by @localai-bot in https://github.com/mudler/LocalAI/pull/9707
  • fix(distributed): make backend upgrade actually re-install on workers by @localai-bot in https://github.com/mudler/LocalAI/pull/9708
  • fix(distributed): split NATS backend.upgrade off install + dedup loads by @localai-bot in https://github.com/mudler/LocalAI/pull/9717
  • fix(gallery): keep auto-upgrade off non-dev backends when -development is installed by @mudler in https://github.com/mudler/LocalAI/pull/9736

Exciting New Features 🎉

  • feat(ui): Interactive model config editor with autocomplete by @richiejp in https://github.com/mudler/LocalAI/pull/9149
  • feat: track files being staged by @mudler in https://github.com/mudler/LocalAI/pull/9275
  • feat: Add Kokoros backend by @richiejp in https://github.com/mudler/LocalAI/pull/9212
  • feat(api): add ollama compatibility by @mudler in https://github.com/mudler/LocalAI/pull/9284
  • feat(sam.cpp): add sam.cpp detection backend by @mudler in https://github.com/mudler/LocalAI/pull/9288
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9300
  • chore(qwen3-asr): pass prompt as context to transcribe by @mudler in https://github.com/mudler/LocalAI/pull/9301
  • feat: Add toggle mechanism to enable/disable models from loading on demand by @neurocis in https://github.com/mudler/LocalAI/pull/9304
  • feat: allow to pin models and skip from reaping by @mudler in https://github.com/mudler/LocalAI/pull/9309
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9310
  • feat: backend versioning, upgrade detection and auto-upgrade by @mudler in https://github.com/mudler/LocalAI/pull/9315
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9318
  • feat(qwen3tts.cpp): add new backend by @mudler in https://github.com/mudler/LocalAI/pull/9316
  • feat(ux): backend management enhancement by @mudler in https://github.com/mudler/LocalAI/pull/9325
  • feat(rocm): bump to 7.x by @mudler in https://github.com/mudler/LocalAI/pull/9323
  • feat(backends): add ik-llama-cpp by @mudler in https://github.com/mudler/LocalAI/pull/9326
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9329
  • feat(vllm): parity with llama.cpp backend by @mudler in https://github.com/mudler/LocalAI/pull/9328
  • feat: refactor shared helpers and enhance MLX backend functionality by @mudler in https://github.com/mudler/LocalAI/pull/9335
  • feat: wire transcription for llama.cpp, add streaming support by @mudler in https://github.com/mudler/LocalAI/pull/9353
  • feat(backend): add turboquant llama.cpp-fork backend by @mudler in https://github.com/mudler/LocalAI/pull/9355
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9356
  • feat(backend): add tinygrad multimodal backend (experimental) by @mudler in https://github.com/mudler/LocalAI/pull/9364
  • feat(backends): add sglang by @mudler in https://github.com/mudler/LocalAI/pull/9359
  • refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer by @mudler in https://github.com/mudler/LocalAI/pull/9380
  • feat(stable-diffusion.ggml): add support for video generation by @mudler in https://github.com/mudler/LocalAI/pull/9420
  • feat(distributed): sync state with frontends, better backend management reporting by @mudler in https://github.com/mudler/LocalAI/pull/9426
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9431
  • feat(gallery): add Wan 2.1 FLF2V 14B 720P by @mudler in https://github.com/mudler/LocalAI/pull/9440
  • feat(gallery): add wan i2v 720p by @mudler in https://github.com/mudler/LocalAI/pull/9457
  • feat: improve CLI error messages with actionable guidance by @localai-bot in https://github.com/mudler/LocalAI/pull/8880
  • chore(whisperx): drop ROCm/hipblas build target by @mudler in https://github.com/mudler/LocalAI/pull/9474
  • feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis by @mudler in https://github.com/mudler/LocalAI/pull/9480
  • feat(importer): expand importer flow to almost all backends by @mudler in https://github.com/mudler/LocalAI/pull/9466
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9498
  • feat: voice recognition by @mudler in https://github.com/mudler/LocalAI/pull/9500
  • feat(insightface): add antispoofing (liveness) detection by @mudler in https://github.com/mudler/LocalAI/pull/9515
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9518
  • feat: add biometrics UI by @mudler in https://github.com/mudler/LocalAI/pull/9524
  • feat: Add Sherpa ONNX backend for ASR and TTS by @richiejp in https://github.com/mudler/LocalAI/pull/8523
  • [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 by @arbrick in https://github.com/mudler/LocalAI/pull/9543
  • feat(react-ui): editorial refresh with Nord palette and polished primitives by @mudler in https://github.com/mudler/LocalAI/pull/9550
  • feat: surface distributed backend management errors by @mudler in https://github.com/mudler/LocalAI/pull/9552
  • feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang by @mudler in https://github.com/mudler/LocalAI/pull/9553
  • feat(llama-cpp): expose split_mode option for multi-GPU placement by @mudler in https://github.com/mudler/LocalAI/pull/9560
  • ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 by @mudler in https://github.com/mudler/LocalAI/pull/9573
  • [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) by @arbrick in https://github.com/mudler/LocalAI/pull/9578
  • feat: Log backend exit code by @richiejp in https://github.com/mudler/LocalAI/pull/9581
  • feat(distributed): support multiple replicas of one model on the same node by @mudler in https://github.com/mudler/LocalAI/pull/9583
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9587
  • feat: localai assistant chat modality by @mudler in https://github.com/mudler/LocalAI/pull/9602
  • chore: add golangci-lint with new-from-merge-base baseline by @richiejp in https://github.com/mudler/LocalAI/pull/9603
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9607
  • feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map by @richiejp in https://github.com/mudler/LocalAI/pull/9563
  • feat(vibevoice-cpp): add purego TTS+ASR backend by @mudler in https://github.com/mudler/LocalAI/pull/9610
  • feat: react chat redesign by @mudler in https://github.com/mudler/LocalAI/pull/9616
  • feat(llama-cpp): bump to d775992 and adapt to spec params refactor by @mudler in https://github.com/mudler/LocalAI/pull/9618
  • feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9629
  • feat(importers): whisper.cpp HF repos pick a quant + nest under whisper/models by @mudler in https://github.com/mudler/LocalAI/pull/9630
  • feat(branding): admin-configurable instance name, tagline, and assets by @mudler in https://github.com/mudler/LocalAI/pull/9635
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9643
  • feat(react-ui): add multilingual (i18n) support by @mudler in https://github.com/mudler/LocalAI/pull/9642
  • feat(ci): allow routing apt traffic through an alternate Ubuntu mirror by @mudler in https://github.com/mudler/LocalAI/pull/9650
  • feat: add LocalVQE backend and audio transformations UI by @richiejp in https://github.com/mudler/LocalAI/pull/9640
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9660
  • feat(concurrency-groups): per-model exclusive groups for backend loading by @mudler in https://github.com/mudler/LocalAI/pull/9662
  • feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp by @mudler in https://github.com/mudler/LocalAI/pull/9654
  • feat(vllm, distributed): tensor parallel distributed workers by @richiejp in https://github.com/mudler/LocalAI/pull/9612
  • feat: support word-level timestamps for faster-whisper by @eglia in https://github.com/mudler/LocalAI/pull/9621
  • feat(importers): add vibevoice-cpp importer for GGUF bundles by @localai-bot in https://github.com/mudler/LocalAI/pull/9685
  • feat(gallery): Speed up load times and clean gallery entries by @richiejp in https://github.com/mudler/LocalAI/pull/9211
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9699
  • feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos by @richiejp in https://github.com/mudler/LocalAI/pull/9686
  • feat(api/transcription): include segments + duration + language on stream done event by @localai-bot in https://github.com/mudler/LocalAI/pull/9709
  • feat(whisper): honor client cancellation via ggml abort_callback by @localai-bot in https://github.com/mudler/LocalAI/pull/9710
  • chore: Security hardening by @richiejp in https://github.com/mudler/LocalAI/pull/9719
  • ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) by @localai-bot in https://github.com/mudler/LocalAI/pull/9726
  • feat(swagger): update swagger by @localai-bot in https://github.com/mudler/LocalAI/pull/9723
  • ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization by @localai-bot in https://github.com/mudler/LocalAI/pull/9727
  • ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) by @localai-bot in https://github.com/mudler/LocalAI/pull/9730
  • ci: consolidate llama-cpp-darwin into the matrix-driven Darwin flow by @mudler in https://github.com/mudler/LocalAI/pull/9731
  • feat(whisper-cpp): implement streaming transcription by @localai-bot in https://github.com/mudler/LocalAI/pull/9751

🧠 Models

  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9399
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9400
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9425
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9436
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9464
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9481
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9491
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9505
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9555
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9558
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9611
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9615
  • Add tags to qwen3-vl-reranker and Qwen3-VL-Embedding to the gallery by @ER-EPR in https://github.com/mudler/LocalAI/pull/9628
  • chore(model gallery): add chroma1-hd diffusers model by @Anai-Guo in https://github.com/mudler/LocalAI/pull/9646
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9653
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9681
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9703
  • chore(model gallery): :robot: add 1 new models via gallery agent by @localai-bot in https://github.com/mudler/LocalAI/pull/9720

📖 Documentation and examples

  • docs: :arrow_up: update docs version mudler/LocalAI by @localai-bot in https://github.com/mudler/LocalAI/pull/9268
  • docs(agents): capture vllm backend lessons + runtime lib packaging by @mudler in https://github.com/mudler/LocalAI/pull/9333
  • chore(agents): Update the backend creation instructions to include Rust and extra tests by @richiejp in https://github.com/mudler/LocalAI/pull/9490

👒 Dependencies

  • chore: :arrow_up: Update ggml-org/llama.cpp to 66c4f9ded01b29d9120255be1ed8d5835bcbb51d by @localai-bot in https://github.com/mudler/LocalAI/pull/9269
  • chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' by @mudler in https://github.com/mudler/LocalAI/pull/9282
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to e8323cabb0e4511ba18a50b1cb34cf1f87fc71ef by @localai-bot in https://github.com/mudler/LocalAI/pull/9281
  • chore: :arrow_up: Update ggml-org/llama.cpp to d132f22fc92f36848f7ccf2fc9987cd0b0120825 by @localai-bot in https://github.com/mudler/LocalAI/pull/9302
  • chore: :arrow_up: Update PABannier/sam3.cpp to 01832ef85fcc8eb6488f1d01cd247f07e96ff5a9 by @localai-bot in https://github.com/mudler/LocalAI/pull/9311
  • chore: :arrow_up: Update ggml-org/llama.cpp to e62fa13c2497b2cd1958cb496e9489e86bbd5182 by @localai-bot in https://github.com/mudler/LocalAI/pull/9312
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9321
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to 6b675a5ede9b0edf0a0f44191e8b79d7ef27615a by @localai-bot in https://github.com/mudler/LocalAI/pull/9320
  • chore: :arrow_up: Update ggml-org/llama.cpp to ff5ef8278615a2462b79b50abdf3cc95cfb31c6f by @localai-bot in https://github.com/mudler/LocalAI/pull/9319
  • chore: :arrow_up: Update ggml-org/llama.cpp to 1e9d771e2c2f1113a5ebdd0dc15bafe57dce64be by @localai-bot in https://github.com/mudler/LocalAI/pull/9330
  • chore(deps): bump softprops/action-gh-release from 2 to 3 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9336
  • chore(deps): bump actions/upload-pages-artifact from 4 to 5 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9337
  • chore(deps): bump github.com/testcontainers/testcontainers-go from 0.41.0 to 0.42.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9338
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9346
  • chore(deps): bump sentence-transformers from 5.2.3 to 5.4.0 in /backend/python/transformers by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9342
  • chore: :arrow_up: Update ggml-org/llama.cpp to e97492369888f5311e4d1f3beb325a36bbed70e9 by @localai-bot in https://github.com/mudler/LocalAI/pull/9347
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 55d3c05bf7b377deaa5dc84d255d9740a345a206 by @localai-bot in https://github.com/mudler/LocalAI/pull/9348
  • chore(deps): bump github.com/google/go-containerregistry from 0.21.3 to 0.21.5 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9343
  • chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats from 0.41.0 to 0.42.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9341
  • chore(deps): bump github.com/swaggo/echo-swagger from 1.4.1 to 1.5.2 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9344
  • chore(deps): bump github.com/charmbracelet/glamour from 0.10.0 to 1.0.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9340
  • chore: :arrow_up: Update ggml-org/llama.cpp to fae3a28070fe4026f87bd6a544aba1b2d1896566 by @localai-bot in https://github.com/mudler/LocalAI/pull/9357
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9358
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9369
  • chore: :arrow_up: Update ggml-org/llama.cpp to b3d758750a268bf93f084ccfa3060fb9a203192a by @localai-bot in https://github.com/mudler/LocalAI/pull/9370
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 1163af96cf6bb4a4b819f998f84c153a49768b99 by @localai-bot in https://github.com/mudler/LocalAI/pull/9368
  • chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9373
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to c41c5ded7af85e01b7fe442ff7950c720706d53a by @localai-bot in https://github.com/mudler/LocalAI/pull/9366
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9384
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to eaf83865a132f66e8f49efe0e78491625942f068 by @localai-bot in https://github.com/mudler/LocalAI/pull/9382
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to a564fdf642780d1df123f1c413b19961375b8346 by @localai-bot in https://github.com/mudler/LocalAI/pull/9383
  • chore: :arrow_up: Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' by @mudler in https://github.com/mudler/LocalAI/pull/9385
  • chore: :arrow_up: Update ggml-org/llama.cpp to 4fbdabdc61c04d1262b581e1b8c0c3b119f688ff by @localai-bot in https://github.com/mudler/LocalAI/pull/9381
  • chore: bump inference defaults from unsloth by @github-actions[bot] in https://github.com/mudler/LocalAI/pull/9396
  • chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9376
  • chore: :arrow_up: Update ggml-org/whisper.cpp to 166c20b473d5f4d04052e699f992f625ea2a2fdd by @localai-bot in https://github.com/mudler/LocalAI/pull/9403
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 52efa12fdae390d1dca6ecd7ca00010fe51f651e by @localai-bot in https://github.com/mudler/LocalAI/pull/9404
  • chore: :arrow_up: Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb by @localai-bot in https://github.com/mudler/LocalAI/pull/9415
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d by @localai-bot in https://github.com/mudler/LocalAI/pull/9417
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe by @localai-bot in https://github.com/mudler/LocalAI/pull/9418
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 by @localai-bot in https://github.com/mudler/LocalAI/pull/9428
  • chore: :arrow_up: Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad by @localai-bot in https://github.com/mudler/LocalAI/pull/9429
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 by @localai-bot in https://github.com/mudler/LocalAI/pull/9430
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e by @localai-bot in https://github.com/mudler/LocalAI/pull/9447
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9451
  • chore: :arrow_up: Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe by @localai-bot in https://github.com/mudler/LocalAI/pull/9450
  • chore: :arrow_up: Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 by @localai-bot in https://github.com/mudler/LocalAI/pull/9448
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9452
  • chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9453
  • chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9454
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9456
  • chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9455
  • chore: :arrow_up: Update ggml-org/llama.cpp to 5a4cd6741fc33227cdacb329f355ab21f8481de2 by @localai-bot in https://github.com/mudler/LocalAI/pull/9479
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to c97702e1057c2fe13a7074cd9069cb9dd6edc1bf by @localai-bot in https://github.com/mudler/LocalAI/pull/9495
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9522
  • chore: :arrow_up: Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 by @localai-bot in https://github.com/mudler/LocalAI/pull/9520
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a by @localai-bot in https://github.com/mudler/LocalAI/pull/9521
  • chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9544
  • chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9546
  • chore: :arrow_up: Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 by @localai-bot in https://github.com/mudler/LocalAI/pull/9548
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 by @localai-bot in https://github.com/mudler/LocalAI/pull/9549
  • chore: :arrow_up: Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d by @localai-bot in https://github.com/mudler/LocalAI/pull/9497
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 by @localai-bot in https://github.com/mudler/LocalAI/pull/9570
  • chore: :arrow_up: Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a by @localai-bot in https://github.com/mudler/LocalAI/pull/9571
  • chore: :arrow_up: Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 by @localai-bot in https://github.com/mudler/LocalAI/pull/9572
  • chore: :arrow_up: Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 by @localai-bot in https://github.com/mudler/LocalAI/pull/9577
  • chore: :arrow_up: Update ggml-org/llama.cpp to 665abc609740d397d30c0d8ef4157dbf900bd1a3 by @localai-bot in https://github.com/mudler/LocalAI/pull/9584
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to d6f3e4e28fbf75e6181e6ea32e734de9ce9304fd by @localai-bot in https://github.com/mudler/LocalAI/pull/9585
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to a81677f59c92d90343aebca51dfed7decf0a0cb0 by @localai-bot in https://github.com/mudler/LocalAI/pull/9586
  • chore(deps): bump github.com/testcontainers/testcontainers-go/modules/postgres from 0.41.0 to 0.42.0 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9591
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.1 to 2.28.2 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9593
  • chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9594
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 453a027c17e4d63a7f16b871197a396240a65138 by @localai-bot in https://github.com/mudler/LocalAI/pull/9608
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to 3d6064b37ef4607917f8acf2ca8c8906d5087413 by @localai-bot in https://github.com/mudler/LocalAI/pull/9617
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to a8aecbf15933295af96504f9a693998322185b5c by @localai-bot in https://github.com/mudler/LocalAI/pull/9625
  • chore: :arrow_up: Update ggml-org/llama.cpp to beb42fffa45eded44804a1fd4916146222371581 by @localai-bot in https://github.com/mudler/LocalAI/pull/9624
  • deps: update quic-go to v0.59.0 (fix session ticket panic) by @egyptianbman in https://github.com/mudler/LocalAI/pull/9655
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9661
  • chore: :arrow_up: Update vllm-project/vllm cu130 wheel to 0.20.1 by @localai-bot in https://github.com/mudler/LocalAI/pull/9649
  • chore(deps): bump docs/themes/hugo-theme-relearn from f69a085 to 8bb66fa by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9665
  • chore: :arrow_up: Update ggml-org/llama.cpp to eff06702b2a52e1020ea009ebd86cb9f5acabab5 by @localai-bot in https://github.com/mudler/LocalAI/pull/9637
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 45dfd80371785731bc2ed05a76252497a4e7a282 by @localai-bot in https://github.com/mudler/LocalAI/pull/9644
  • chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9663
  • chore: :arrow_up: Update ggml-org/llama.cpp to bbeb89d76c41bc250f16e4a6fefcc9b530d6e3f3 by @localai-bot in https://github.com/mudler/LocalAI/pull/9676
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 8b56d813a9ed04fa7b7fe2588fddd845cf64eccb by @localai-bot in https://github.com/mudler/LocalAI/pull/9677
  • chore: :arrow_up: Update TheTom/llama-cpp-turboquant to 69d8e4be47243e83b3d0d71e932bc7aa61c644dc by @localai-bot in https://github.com/mudler/LocalAI/pull/9638
  • chore: :arrow_up: Update ggml-org/whisper.cpp to 4bf733672b2871d4153158af4f621a6dd9104f4a by @localai-bot in https://github.com/mudler/LocalAI/pull/9636
  • chore(model-gallery): :arrow_up: update checksum by @localai-bot in https://github.com/mudler/LocalAI/pull/9700
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to b93721902b4662f9b973b1c412006081c958d085 by @localai-bot in https://github.com/mudler/LocalAI/pull/9697
  • chore: :arrow_up: Update ggml-org/llama.cpp to 2496f9c14965c39589f53eea31bdb6d762b1d360 by @localai-bot in https://github.com/mudler/LocalAI/pull/9698
  • chore: :arrow_up: Update leejet/stable-diffusion.cpp to 90e87bc846f17059771efb8aaa31e9ef0cab6f78 by @localai-bot in https://github.com/mudler/LocalAI/pull/9701
  • chore(deps): bump openssl from 0.10.76 to 0.10.79 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9694
  • chore(deps): bump the go_modules group across 1 directory with 8 updates by @dependabot[bot] in https://github.com/mudler/LocalAI/pull/9705
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 9a26522af234f8db079ae3735f35ab6c20fe2c66 by @localai-bot in https://github.com/mudler/LocalAI/pull/9713
  • chore: :arrow_up: Update ggml-org/llama.cpp to 05ff59cb57860cc992fc6dcede32c696efea711c by @localai-bot in https://github.com/mudler/LocalAI/pull/9714
  • chore: :arrow_up: Update ggml-org/whisper.cpp to c81b2dabbc45484dee2ca6658cfe39c841df5c70 by @localai-bot in https://github.com/mudler/LocalAI/pull/9712
  • chore(deps): bump LocalAGI for collection rehydrate-on-init-failure fix by @localai-bot in https://github.com/mudler/LocalAI/pull/9721
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 98950267c67fd95937a54ebd6e3c66cf2679b710 by @localai-bot in https://github.com/mudler/LocalAI/pull/9725
  • chore: :arrow_up: Update ggml-org/llama.cpp to 9f5f0e689c9e977e5f23a27e344aa36082f44738 by @localai-bot in https://github.com/mudler/LocalAI/pull/9724
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to ab0f22b819ac57b7e7484f69c00c10fc755d5c6c by @localai-bot in https://github.com/mudler/LocalAI/pull/9734
  • chore: :arrow_up: Update ggml-org/llama.cpp to 00d56b11c3477b99bc18562dc1d1834f0d961778 by @localai-bot in https://github.com/mudler/LocalAI/pull/9733

Other Changes

  • ci: add pre-built base-grpc-builder image infrastructure (PR 1/2) by @localai-bot in https://github.com/mudler/LocalAI/pull/9737
  • ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) by @localai-bot in https://github.com/mudler/LocalAI/pull/9738
  • chore: :arrow_up: Update ggml-org/llama.cpp to 1e5ad35d560b90a8ac447d149c8f8447ae1fcaa0 by @localai-bot in https://github.com/mudler/LocalAI/pull/9739
  • docs(agents): update CI caching docs after the GHA-free-tier migration by @localai-bot in https://github.com/mudler/LocalAI/pull/9742
  • ci: split backend-jobs into single-arch and multi-arch matrices by @localai-bot in https://github.com/mudler/LocalAI/pull/9746
  • chore: :arrow_up: Update ggml-org/llama.cpp to 2b2babd1243c67ca811c0a5852cedf92b1a20024 by @localai-bot in https://github.com/mudler/LocalAI/pull/9747
  • chore: :arrow_up: Update ikawrakow/ik_llama.cpp to 23127139cb6fa314899c3b5f4935b88b3374c56c by @localai-bot in https://github.com/mudler/LocalAI/pull/9748
  • chore: :arrow_up: Update ggml-org/whisper.cpp to c33c5618b72bb345df029b730b36bc0e369845a3 by @localai-bot in https://github.com/mudler/LocalAI/pull/9749
  • chore: :arrow_up: Update vllm-project/vllm cu130 wheel to 0.20.2 by @localai-bot in https://github.com/mudler/LocalAI/pull/9750
  • chore: :arrow_up: Update ggml-org/llama.cpp to 389ff61d77b5c71cec0cf92fe4e5d01ace80b797 by @localai-bot in https://github.com/mudler/LocalAI/pull/9752

New Contributors

  • @neurocis made their first contribution in https://github.com/mudler/LocalAI/pull/9304
  • @thelittlefireman made their first contribution in https://github.com/mudler/LocalAI/pull/9264
  • @mvanhorn made their first contribution in https://github.com/mudler/LocalAI/pull/9379
  • @keithmattix made their first contribution in https://github.com/mudler/LocalAI/pull/9410
  • @SAY-5 made their first contribution in https://github.com/mudler/LocalAI/pull/9438
  • @pjbrzozowski made their first contribution in https://github.com/mudler/LocalAI/pull/9427
  • @russell made their first contribution in https://github.com/mudler/LocalAI/pull/9446
  • @leinasi2014 made their first contribution in https://github.com/mudler/LocalAI/pull/9443
  • @sec171 made their first contribution in https://github.com/mudler/LocalAI/pull/9461
  • @Dennisadira made their first contribution in https://github.com/mudler/LocalAI/pull/9411
  • @orbisai0security made their first contribution in https://github.com/mudler/LocalAI/pull/9486
  • @Anai-Guo made their first contribution in https://github.com/mudler/LocalAI/pull/9526
  • @arbrick made their first contribution in https://github.com/mudler/LocalAI/pull/9543
  • @eglia made their first contribution in https://github.com/mudler/LocalAI/pull/9541
  • @egyptianbman made their first contribution in https://github.com/mudler/LocalAI/pull/9655
  • @arteven made their first contribution in https://github.com/mudler/LocalAI/pull/9674

Full Changelog: https://github.com/mudler/LocalAI/compare/v4.1.3...v4.2.0

Security Fixes

  • grpc-server hardening – removed unsafe sprintf() in C++ grpc server
  • OIDC library bump (go‑oidc/v3) from 3.17.0 → 3.18.0
  • Settings API now strips env‑supplied ApiKeys before persisting

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track LocalAI

Get notified when new releases ship.

Sign up free

About LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

All releases →

Related context

Beta — feedback welcome: [email protected]