Text Generation Web UI
LLM FrameworksA desktop app for running local LLMs offline with chat, vision, tool‑calling, and API compatibility
Features
- Local chat & instruction‑following (LLM) interface with OpenAI/Anthropic API drop‑in support
- Multimodal vision input for image understanding
- Tool‑calling capability to invoke custom Python functions (e.g., web search, math)
- Training tab for LoRA fine‑tuning on chat or raw text datasets
- Image generation tab with diffusers models and quantization options
Security Response History
1 CVE| CVE | Severity | Disclosed | Patched (this tool) | vs Ecosystem Median |
|---|---|---|---|---|
| CVE-2023-4863 KEV |
high
CVSS 8.8
|
2023-09-13 | 2026-01-08 | 2y 4mo / median 2y 4mo |
Recent releases
View all 22 releases →- Use cuda13.1 build if `nvidia-smi` reports CUDA Version >= 13.1; otherwise use cuda12.4
- ik_llama.cpp offers new quant types – choose based on preference
- Portable builds now support Windows, Linux, macOS with specific GPU/ROCm/CPU variants
- Redesigned chat composer: taller input area with paperclip and action buttons pinned to bottom (Gemini/DeepSeek style)
- Smooth scroll animation when sending a new message
- Electron improvements – persist window bounds, add --no-electron flag, disable spellcheck in chat input
Full changelog
Changes
- Redesigned chat composer: Taller input area with the paperclip and message-action buttons pinned to the bottom, similar to Gemini and DeepSeek.
- Smooth scroll animation when sending a new message: Inspired by Gemini's chat UI.
- Electron improvements:
- Persist window bounds and maximize state across launches.
- Add a
--no-electronflag to skip the desktop window and use the web UI in the browser instead. - Disable spellcheck in the chat input.
- API: Add support for list-format content in tool and assistant messages.
- Add more space below the last chat/chat-instruct message so its action buttons have breathing room.
Bug fixes
- Fix speculative decoding broken by upstream llama.cpp arg renames (#7541).
- Fix truncation length reverting after model load on UI reload (#7540).
- Don't clear the chat input when sending a message with no model loaded (#7542).
- Electron:
- Fix big character picture failing to load (#7540).
- Fix
--listenmode in the launcher. - Fix missing log colors on Windows.
Dependency updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/68380ae11b564af67196afc70f10c99dbb532fa9
- Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9a26522af234f8db079ae3735f35ab6c20fe2c66
Portable builds
TextGen is now a desktop app for local LLMs. Download, unzip, double-click.
[!NOTE]
NVIDIA GPU: Ifnvidia-smireports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (817 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (319 MB) | Download (334 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (396 MB) | — |
| CPU only | Download (307 MB) | Download (334 MB) |
macOS
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
textgen-4.6/
textgen-4.7/
user_data/ <-- shared by both installs
- When upgrading portable builds, replace the entire executable with the new `textgen`/`textgen.bat` and adjust any scripts that called the old start script.
- Existing configurations using `--row-split` must be migrated to `--split-mode tensor` (or another supported mode).
- Portable build selection now requires CUDA version check: use `cuda13.1` builds if `nvidia-smi` reports CUDA ≥ 13.1, otherwise use `cuda12.4`.
- Run `textgen` / `textgen.bat` executables instead of previous start scripts; old script invocation no longer works.
- Flag `--row-split` removed, replaced by new `--split-mode` flag with a `tensor` option for multi‑GPU inference.
- Native desktop builds bundle Electron and launch as native windows (run via `textgen`/`textgen.bat`).
- Tensor parallelism support added via `--split-mode tensor` in llama.cpp, boosting multi‑GPU performance by >60%.
- UI overhaul: Inter font default, Lucide SVG icons for actions, segmented chat mode control, redesigned input card, flat underline tab indicator, hairline sidebar handles.
Full changelog
Changes
- Native desktop app: Portable builds now bundle Electron and open as a native window. Run
textgen/textgen.batinstead of the previous start scripts. Pass--listenor--nowebuito skip the window and run the server directly. - Major UI overhaul:
- Replace Noto Sans with Inter as the default font.
- Replace emoji refresh/save/delete buttons with Lucide SVG icons.
- Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
- Redesign the chat input as a single rounded card with a circular accent-colored send button.
- Use a flat underline for the active tab indicator.
- Replace the sidebar toggle buttons with 3px hairline handles on desktop.
- Tensor parallelism for llama.cpp: New
--split-modeflag (replacing--row-split) with atensoroption that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend,tensorandrowfall back tograph. - Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
- Add support for standalone
.jinja/.jinja2instruction template files in the UI, in addition to the existing.yamlformat (#7517).
Bug fixes
- Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
- Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
- Fix extension settings not saving for extensions inside
user_data/extensions(#7525).
Dependency updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
- Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
- Update transformers to 5.6
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.
[!NOTE]
NVIDIA GPU: Ifnvidia-smireports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |
macOS
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
textgen-4.6/
textgen-4.7/
user_data/ <-- shared by both installs
- --listen or --nowebui can be used to skip the native window and run the server directly.
- Portable builds require CUDA 13.1 when `nvidia-smi` reports CUDA Version ≥ 13.1; otherwise use the CUDA 12.4 build.
- Removed previous start script invocation; run `textgen` / `textgen.bat` to launch the Electron‑wrapped native window.
- Tensor parallelism flag `--split-mode tensor` for up to 60%+ faster multi‑GPU inference with llama.cpp.
- Major UI overhaul: default font Inter, Lucide SVG icons, segmented chat mode control, redesigned input card, flat tab indicator, hairline sidebar handles.
Full changelog
Changes
- Native desktop app: Portable builds now bundle Electron and open as a native window. Run
textgen/textgen.batinstead of the previous start scripts. Pass--listenor--nowebuito skip the window and run the server directly. - Major UI overhaul:
- Replace Noto Sans with Inter as the default font.
- Replace emoji refresh/save/delete buttons with Lucide SVG icons.
- Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
- Redesign the chat input as a single rounded card with a circular accent-colored send button.
- Use a flat underline for the active tab indicator.
- Replace the sidebar toggle buttons with 3px hairline handles on desktop.
- Tensor parallelism for llama.cpp: New
--split-modeflag (replacing--row-split) with atensoroption that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend,tensorandrowfall back tograph. - Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
- Add support for standalone
.jinja/.jinja2instruction template files in the UI, in addition to the existing.yamlformat (#7517).
Bug fixes
- Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
- Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
- Fix extension settings not saving for extensions inside
user_data/extensions(#7525).
Dependency updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
- Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
- Update transformers to 5.6
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.
[!NOTE]
NVIDIA GPU: Ifnvidia-smireports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |
macOS
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
textgen-4.6/
textgen-4.7/
user_data/ <-- shared by both installs
- Native portable builds bundle Electron; launch via `textgen`/`textgen.bat`, optional `--listen` / `--nowebui` flags.
- UI redesign: Inter font, Lucide SVG icons, segmented chat mode control, rounded input card with accent send button, flat underline active tab indicator, hairline sidebar handles.
- Tensor parallelism via new `--split-mode tensor` flag for up to 60%+ multi‑GPU speedup on llama.cpp (fallback to graph on ik_llama.cpp).
Full changelog
Changes
- Native desktop app: Portable builds now bundle Electron and open as a native window. Run
textgen/textgen.batinstead of the previous start scripts. Pass--listenor--nowebuito skip the window and run the server directly. - Major UI overhaul:
- Replace Noto Sans with Inter as the default font.
- Replace emoji refresh/save/delete buttons with Lucide SVG icons.
- Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
- Redesign the chat input as a single rounded card with a circular accent-colored send button.
- Use a flat underline for the active tab indicator.
- Replace the sidebar toggle buttons with 3px hairline handles on desktop.
- Tensor parallelism for llama.cpp: New
--split-modeflag (replacing--row-split) with atensoroption that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend,tensorandrowfall back tograph. - Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
- Add support for standalone
.jinja/.jinja2instruction template files in the UI, in addition to the existing.yamlformat (#7517).
Bug fixes
- Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
- Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
- Fix extension settings not saving for extensions inside
user_data/extensions(#7525).
Dependency updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
- Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
- Update transformers to 5.6
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.
[!NOTE]
NVIDIA GPU: Ifnvidia-smireports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |
macOS
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
textgen-4.6/
textgen-4.7/
user_data/ <-- shared by both installs
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.