This release includes 3 security fixes for security teams reviewing exposed deployments.
Affected surfaces
Summary
AI summaryBroad release touches Bug fixes, Dependency updates, Updating a portable install, and Vulkan.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Security | Medium |
Restrict CORS to localhost by default to prevent drive-by API access. Restrict CORS to localhost by default to prevent drive-by API access. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Security | Medium |
UI: Improve web search security by rejecting non-HTTP links. UI: Improve web search security by rejecting non-HTTP links. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Security | Medium |
Sanitize character name in load_character to prevent path traversal. Sanitize character name in load_character to prevent path traversal. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Security | Medium |
Fix path traversal in load_template_by_name (#7562). Fix path traversal in load_template_by_name (#7562). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Add draft-mtp as new --spec-type option for MTP speculative decoding support. Add draft-mtp as new --spec-type option for MTP speculative decoding support. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Web search results now include snippet excerpts, reducing need for fetch_webpage calls. Web search results now include snippet excerpts, reducing need for fetch_webpage calls. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Drop link URLs from fetch_webpage output, showing plain text links instead of markdown. Drop link URLs from fetch_webpage output, showing plain text links instead of markdown. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Prettier rendering of web_search results in chat with spinner during call. Prettier rendering of web_search results in chat with spinner during call. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add info message to Activate web search checkbox. Add info message to Activate web search checkbox. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Show live generation speed (tokens/s) and context size while generating. Show live generation speed (tokens/s) and context size while generating. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add Linux aarch64 portable builds for DGX Spark support. Add Linux aarch64 portable builds for DGX Spark support. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add Check for updates button in Electron Session tab. Add Check for updates button in Electron Session tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add folder picker for models directory in Electron. Add folder picker for models directory in Electron. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add right-click context menu for copying text in Electron. Add right-click context menu for copying text in Electron. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add spellcheck toggle in Electron Session tab. Add spellcheck toggle in Electron Session tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Store app data in user_data/cache/electron instead of OS default location. Store app data in user_data/cache/electron instead of OS default location. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Disable DNS-over-HTTPS probes in Electron. Disable DNS-over-HTTPS probes in Electron. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Auto-detect and auto-select sibling mmproj files when loading a model. Auto-detect and auto-select sibling mmproj files when loading a model. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Detect mmproj-*.gguf files in main models folder, appearing in mmproj dropdown and hidden from regular model dropdown. Detect mmproj-*.gguf files in main models folder, appearing in mmproj dropdown and hidden from regular model dropdown. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Treat negative --ctx-size values as auto (0). Treat negative --ctx-size values as auto (0). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
Add drag-and-drop file upload support to chat input (Gradio fork). Add drag-and-drop file upload support to chat input (Gradio fork). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
One-click installer now tracks latest release tag, not bleeding-edge main. One-click installer now tracks latest release tag, not bleeding-edge main. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Add project icon courtesy of LMLocalizer on Reddit. Add project icon courtesy of LMLocalizer on Reddit. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Reorganize right sidebar with Mode/Character/Chat style on top. Reorganize right sidebar with Mode/Character/Chat style on top. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Hide reasoning and tools controls in chat mode, shown only in instruct/chat-instruct. Hide reasoning and tools controls in chat mode, shown only in instruct/chat-instruct. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Fade in new messages, fix scroll-up jump on send. Fade in new messages, fix scroll-up jump on send. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Rename Send dummy message/reply to Insert user/assistant message. Rename Send dummy message/reply to Insert user/assistant message. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Polish character dropdown in chat tab. Polish character dropdown in chat tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Tighten spacing between dropdowns and refresh buttons. Tighten spacing between dropdowns and refresh buttons. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Medium |
Improve looks of Session tab. Improve looks of Session tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
Full changelog
Changes
- MTP speculative decoding support: Add
draft-mtpas a new--spec-typeoption. Auto-enabled when loading MTP GGUFs (e.g. Qwen 3.6 MoE MTP builds). - Web search improvements:
- Add snippet support to the
web_searchtool: results now include a short text excerpt that often answers the query directly, eliminating the need for a follow-upfetch_webpagecall (#7548). - Drop link URLs from
fetch_webpageoutput (links now appear as plain text instead of[text](url)markdown), significantly reducing tokens used per page. - Prettier rendering of
web_searchresults in the chat, with a spinner during the call. - Add an info message to the "Activate web search" checkbox.
- Add snippet support to the
- Show live generation speed (tokens/s) and context size while generating (#7563).
- DGX Spark support: Add Linux aarch64 portable builds.
- Electron
- Add "Check for updates" button in the Session tab.
- Add a folder picker for the models directory.
- Add right-click context menu for copying text.
- Add a spellcheck toggle in the Session tab (#7550).
- Store app data in
user_data/cache/electroninstead of the OS default location. - Disable DNS-over-HTTPS probes.
- One-click installer: Track the latest release tag instead of bleeding-edge
main. - Auto-detect and auto-select sibling mmproj files when loading a model (#7564).
- Detect
mmproj-*.gguffiles in the main models folder: They appear in the mmproj dropdown and are hidden from the regular model dropdown. - Project icon: Add an icon, courtesy of LMLocalizer on Reddit.
- Treat negative
--ctx-sizevalues as auto (0). - UI
- Add drag-and-drop file upload support to the chat input (Gradio fork).
- Reorganize the right sidebar with Mode/Character/Chat style on top.
- Hide reasoning and tools controls in chat mode (only shown in instruct / chat-instruct).
- Fade in new messages, fix scroll-up jump on send.
- Rename "Send dummy message/reply" to "Insert user/assistant message".
- Polish character dropdown in chat tab.
- Tighten spacing between dropdowns and refresh buttons.
- Improve the looks of the Session tab.
Security
- Restrict CORS to localhost by default to prevent drive-by API access.
--listenand--public-apiopt into network exposure. - Sanitize character name in
load_characterto prevent path traversal. - fix: prevent path traversal in load_template_by_name (#7562). Thanks, @Allen930311.
- UI: Improve web search security by rejecting non-HTTP links.
Bug fixes
- Fix llama-server not being killed when the parent process exits on Windows, e.g. when closing the console window or killing python.exe (#7574).
- Fix streaming output leaking across chats when switching mid-stream (#7555).
- Fix continue-mode regressions across template families.
- Fix incorrect prompts generated with continue mode. Thanks, @MeemeeLab.
- Fix thinking channel being lost across tool-call turns (#7578).
- Fix API model load silently dropping hyphenated arg keys (#7577).
- Fix chat deletion failing when
user_data/logsis a symlink (#7579). - Fix token count not being set in non-streaming mode.
- Keep web search blocks closed when the user closes them mid-stream.
- fix(win): set PYTHONUTF8 for non-ASCII locale Windows compatibility (#7560). Thanks, @jerry78424.
- Set TORCH_VERSION to 2.9.0 to match xformers 0.0.33's torch pin (#7581). Thanks, @AJ-Gazin.
Dependency updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/e947228222147356bc7e64154d3439e142481632
- Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/40254a51daf485b2b644bcb82a84278d95745ee5
- Update ExLlamaV3 to 0.0.34
Portable builds
TextGen is now a desktop app for local LLMs. Download, unzip, double-click.
[!NOTE]
NVIDIA GPU: Ifnvidia-smireports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (936 MB) | Download (1.24 GB) |
| NVIDIA (CUDA 13.1) | Download (840 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (617 MB) | — |
| CPU only | Download (319 MB) | Download (335 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (893 MB) | Download (1.21 GB) |
| NVIDIA (CUDA 13.1) | Download (826 MB) | Download (1.33 GB) |
| NVIDIA ARM64 (CUDA 13.1) | Download (910 MB) | — |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (409 MB) | — |
| CPU only | Download (307 MB) | Download (338 MB) |
macOS
macOS note: You need to run xattr -cr /path/to/your/textgen-folder on the extracted folder before launching. See https://github.com/oobabooga/textgen/issues/7558.
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (272 MB) |
| Intel (x86_64) | Download (284 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
textgen-4.6/
textgen-4.7/
user_data/ <-- shared by both installs
Security Fixes
- Restrict CORS to localhost by default (opt‑in via --listen/--public-api)
- Sanitize character name in load_character to prevent path traversal
- Fix: prevent path traversal in load_template_by_name (#7562)
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Text Generation Web UI
The original local LLM interface. Text, vision, tool-calling, training, and more. 100% offline.
Related context
Related tools
Beta — feedback welcome: [email protected]