Text Generation Web UI

v4.9 Security

This release includes 3 security fixes for security teams reviewing exposed deployments.

Published 2mo LLM Frameworks

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

This release patches 3 known CVEs

Affected surfaces

auth rce_ssrf

Summary

AI summary

Broad release touches Bug fixes, Dependency updates, Updating a portable install, and Vulkan.

Changes in this release

Type	Severity	Summary	CVE
Security
Security	Medium	Restrict CORS to localhost by default to prevent drive-by API access. Restrict CORS to localhost by default to prevent drive-by API access. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Security	Medium	UI: Improve web search security by rejecting non-HTTP links. UI: Improve web search security by rejecting non-HTTP links. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Security	Medium	Sanitize character name in load_character to prevent path traversal. Sanitize character name in load_character to prevent path traversal. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Security	Medium	Fix path traversal in load_template_by_name (#7562). Fix path traversal in load_template_by_name (#7562). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature
Feature	Medium	Add draft-mtp as new --spec-type option for MTP speculative decoding support. Add draft-mtp as new --spec-type option for MTP speculative decoding support. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Web search results now include snippet excerpts, reducing need for fetch_webpage calls. Web search results now include snippet excerpts, reducing need for fetch_webpage calls. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Drop link URLs from fetch_webpage output, showing plain text links instead of markdown. Drop link URLs from fetch_webpage output, showing plain text links instead of markdown. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Prettier rendering of web_search results in chat with spinner during call. Prettier rendering of web_search results in chat with spinner during call. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add info message to Activate web search checkbox. Add info message to Activate web search checkbox. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Show live generation speed (tokens/s) and context size while generating. Show live generation speed (tokens/s) and context size while generating. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add Linux aarch64 portable builds for DGX Spark support. Add Linux aarch64 portable builds for DGX Spark support. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add Check for updates button in Electron Session tab. Add Check for updates button in Electron Session tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add folder picker for models directory in Electron. Add folder picker for models directory in Electron. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add right-click context menu for copying text in Electron. Add right-click context menu for copying text in Electron. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add spellcheck toggle in Electron Session tab. Add spellcheck toggle in Electron Session tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Store app data in user_data/cache/electron instead of OS default location. Store app data in user_data/cache/electron instead of OS default location. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Disable DNS-over-HTTPS probes in Electron. Disable DNS-over-HTTPS probes in Electron. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Auto-detect and auto-select sibling mmproj files when loading a model. Auto-detect and auto-select sibling mmproj files when loading a model. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Detect mmproj-.gguf files in main models folder, appearing in mmproj dropdown and hidden from regular model dropdown. Detect mmproj-.gguf files in main models folder, appearing in mmproj dropdown and hidden from regular model dropdown. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Treat negative --ctx-size values as auto (0). Treat negative --ctx-size values as auto (0). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Add drag-and-drop file upload support to chat input (Gradio fork). Add drag-and-drop file upload support to chat input (Gradio fork). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	One-click installer now tracks latest release tag, not bleeding-edge main. One-click installer now tracks latest release tag, not bleeding-edge main. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Add project icon courtesy of LMLocalizer on Reddit. Add project icon courtesy of LMLocalizer on Reddit. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Reorganize right sidebar with Mode/Character/Chat style on top. Reorganize right sidebar with Mode/Character/Chat style on top. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Hide reasoning and tools controls in chat mode, shown only in instruct/chat-instruct. Hide reasoning and tools controls in chat mode, shown only in instruct/chat-instruct. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Fade in new messages, fix scroll-up jump on send. Fade in new messages, fix scroll-up jump on send. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Rename Send dummy message/reply to Insert user/assistant message. Rename Send dummy message/reply to Insert user/assistant message. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Polish character dropdown in chat tab. Polish character dropdown in chat tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Tighten spacing between dropdowns and refresh buttons. Tighten spacing between dropdowns and refresh buttons. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	Improve looks of Session tab. Improve looks of Session tab. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—

Full changelog

Changes

MTP speculative decoding support: Add draft-mtp as a new --spec-type option. Auto-enabled when loading MTP GGUFs (e.g. Qwen 3.6 MoE MTP builds).
Web search improvements:
- Add snippet support to the web_search tool: results now include a short text excerpt that often answers the query directly, eliminating the need for a follow-up fetch_webpage call (#7548).
- Drop link URLs from fetch_webpage output (links now appear as plain text instead of [text](url) markdown), significantly reducing tokens used per page.
- Prettier rendering of web_search results in the chat, with a spinner during the call.
- Add an info message to the "Activate web search" checkbox.
Show live generation speed (tokens/s) and context size while generating (#7563).
DGX Spark support: Add Linux aarch64 portable builds.
Electron
- Add "Check for updates" button in the Session tab.
- Add a folder picker for the models directory.
- Add right-click context menu for copying text.
- Add a spellcheck toggle in the Session tab (#7550).
- Store app data in user_data/cache/electron instead of the OS default location.
- Disable DNS-over-HTTPS probes.
One-click installer: Track the latest release tag instead of bleeding-edge main.
Auto-detect and auto-select sibling mmproj files when loading a model (#7564).
Detect mmproj-*.gguf files in the main models folder: They appear in the mmproj dropdown and are hidden from the regular model dropdown.
Project icon: Add an icon, courtesy of LMLocalizer on Reddit.
Treat negative --ctx-size values as auto (0).
UI
- Add drag-and-drop file upload support to the chat input (Gradio fork).
- Reorganize the right sidebar with Mode/Character/Chat style on top.
- Hide reasoning and tools controls in chat mode (only shown in instruct / chat-instruct).
- Fade in new messages, fix scroll-up jump on send.
- Rename "Send dummy message/reply" to "Insert user/assistant message".
- Polish character dropdown in chat tab.
- Tighten spacing between dropdowns and refresh buttons.
- Improve the looks of the Session tab.

Security

Restrict CORS to localhost by default to prevent drive-by API access. --listen and --public-api opt into network exposure.
Sanitize character name in load_character to prevent path traversal.
fix: prevent path traversal in load_template_by_name (#7562). Thanks, @Allen930311.
UI: Improve web search security by rejecting non-HTTP links.

Bug fixes

Fix llama-server not being killed when the parent process exits on Windows, e.g. when closing the console window or killing python.exe (#7574).
Fix streaming output leaking across chats when switching mid-stream (#7555).
Fix continue-mode regressions across template families.
Fix incorrect prompts generated with continue mode. Thanks, @MeemeeLab.
Fix thinking channel being lost across tool-call turns (#7578).
Fix API model load silently dropping hyphenated arg keys (#7577).
Fix chat deletion failing when user_data/logs is a symlink (#7579).
Fix token count not being set in non-streaming mode.
Keep web search blocks closed when the user closes them mid-stream.
fix(win): set PYTHONUTF8 for non-ASCII locale Windows compatibility (#7560). Thanks, @jerry78424.
Set TORCH_VERSION to 2.9.0 to match xformers 0.0.33's torch pin (#7581). Thanks, @AJ-Gazin.

Dependency updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/e947228222147356bc7e64154d3439e142481632
Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/40254a51daf485b2b644bcb82a84278d95745ee5
Update ExLlamaV3 to 0.0.34

Portable builds

TextGen is now a desktop app for local LLMs. Download, unzip, double-click.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (936 MB) | Download (1.24 GB) |
| NVIDIA (CUDA 13.1) | Download (840 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (617 MB) | — |
| CPU only | Download (319 MB) | Download (335 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (893 MB) | Download (1.21 GB) |
| NVIDIA (CUDA 13.1) | Download (826 MB) | Download (1.33 GB) |
| NVIDIA ARM64 (CUDA 13.1) | Download (910 MB) | — |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (409 MB) | — |
| CPU only | Download (307 MB) | Download (338 MB) |

macOS

macOS note: You need to run xattr -cr /path/to/your/textgen-folder on the extracted folder before launching. See https://github.com/oobabooga/textgen/issues/7558.

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (272 MB) |
| Intel (x86_64) | Download (284 MB) |

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs

Security Fixes

Restrict CORS to localhost by default (opt‑in via --listen/--public-api)
Sanitize character name in load_character to prevent path traversal
Fix: prevent path traversal in load_template_by_name (#7562)

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Text Generation Web UI

Get notified when new releases ship.

About Text Generation Web UI

The original local LLM interface. Text, vision, tool-calling, training, and more. 100% offline.

All releases →