Config change

v4.9 Mixed 2mo

Auth RCE / SSRF

MTP, Web search snippets, Electron UI

Open

v4.8 Security relevant patches CVE-2023-4863 2mo

⚠ Upgrade required

Use cuda13.1 build if `nvidia-smi` reports CUDA Version >= 13.1; otherwise use cuda12.4
ik_llama.cpp offers new quant types – choose based on preference
Portable builds now support Windows, Linux, macOS with specific GPU/ROCm/CPU variants

Notable features

Redesigned chat composer: taller input area with paperclip and action buttons pinned to bottom (Gemini/DeepSeek style)
Smooth scroll animation when sending a new message
Electron improvements – persist window bounds, add --no-electron flag, disable spellcheck in chat input

Full changelog

Changes

Redesigned chat composer: Taller input area with the paperclip and message-action buttons pinned to the bottom, similar to Gemini and DeepSeek.
Smooth scroll animation when sending a new message: Inspired by Gemini's chat UI.
Electron improvements:
- Persist window bounds and maximize state across launches.
- Add a --no-electron flag to skip the desktop window and use the web UI in the browser instead.
- Disable spellcheck in the chat input.
API: Add support for list-format content in tool and assistant messages.
Add more space below the last chat/chat-instruct message so its action buttons have breathing room.

Bug fixes

Fix speculative decoding broken by upstream llama.cpp arg renames (#7541).
Fix truncation length reverting after model load on UI reload (#7540).
Don't clear the chat input when sending a message with no model loaded (#7542).
Electron:
- Fix big character picture failing to load (#7540).
- Fix --listen mode in the launcher.
- Fix missing log colors on Windows.

Dependency updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/68380ae11b564af67196afc70f10c99dbb532fa9
Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9a26522af234f8db079ae3735f35ab6c20fe2c66

Portable builds

TextGen is now a desktop app for local LLMs. Download, unzip, double-click.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (817 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (319 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (396 MB) | — |
| CPU only | Download (307 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs

View release on GitHub

v4.7.3 Breaking risk 2mo

⚠ Upgrade required

When upgrading portable builds, replace the entire executable with the new `textgen`/`textgen.bat` and adjust any scripts that called the old start script.
Existing configurations using `--row-split` must be migrated to `--split-mode tensor` (or another supported mode).
Portable build selection now requires CUDA version check: use `cuda13.1` builds if `nvidia-smi` reports CUDA ≥ 13.1, otherwise use `cuda12.4`.

Breaking changes

Run `textgen` / `textgen.bat` executables instead of previous start scripts; old script invocation no longer works.
Flag `--row-split` removed, replaced by new `--split-mode` flag with a `tensor` option for multi‑GPU inference.

Notable features

Native desktop builds bundle Electron and launch as native windows (run via `textgen`/`textgen.bat`).
Tensor parallelism support added via `--split-mode tensor` in llama.cpp, boosting multi‑GPU performance by >60%.
UI overhaul: Inter font default, Lucide SVG icons for actions, segmented chat mode control, redesigned input card, flat underline tab indicator, hairline sidebar handles.

Full changelog

Changes

Native desktop app: Portable builds now bundle Electron and open as a native window. Run textgen / textgen.bat instead of the previous start scripts. Pass --listen or --nowebui to skip the window and run the server directly.
Major UI overhaul:
- Replace Noto Sans with Inter as the default font.
- Replace emoji refresh/save/delete buttons with Lucide SVG icons.
- Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
- Redesign the chat input as a single rounded card with a circular accent-colored send button.
- Use a flat underline for the active tab indicator.
- Replace the sidebar toggle buttons with 3px hairline handles on desktop.
Tensor parallelism for llama.cpp: New --split-mode flag (replacing --row-split) with a tensor option that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend, tensor and row fall back to graph.
Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
Add support for standalone .jinja/.jinja2 instruction template files in the UI, in addition to the existing .yaml format (#7517).

Bug fixes

Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
Fix extension settings not saving for extensions inside user_data/extensions (#7525).

Dependency updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
Update transformers to 5.6

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs

View release on GitHub

v4.7.2 Breaking risk 2mo

⚠ Upgrade required

--listen or --nowebui can be used to skip the native window and run the server directly.
Portable builds require CUDA 13.1 when `nvidia-smi` reports CUDA Version ≥ 13.1; otherwise use the CUDA 12.4 build.

Breaking changes

Removed previous start script invocation; run `textgen` / `textgen.bat` to launch the Electron‑wrapped native window.

Notable features

Tensor parallelism flag `--split-mode tensor` for up to 60%+ faster multi‑GPU inference with llama.cpp.
Major UI overhaul: default font Inter, Lucide SVG icons, segmented chat mode control, redesigned input card, flat tab indicator, hairline sidebar handles.

Full changelog

Changes

Native desktop app: Portable builds now bundle Electron and open as a native window. Run textgen / textgen.bat instead of the previous start scripts. Pass --listen or --nowebui to skip the window and run the server directly.
Major UI overhaul:
- Replace Noto Sans with Inter as the default font.
- Replace emoji refresh/save/delete buttons with Lucide SVG icons.
- Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
- Redesign the chat input as a single rounded card with a circular accent-colored send button.
- Use a flat underline for the active tab indicator.
- Replace the sidebar toggle buttons with 3px hairline handles on desktop.
Tensor parallelism for llama.cpp: New --split-mode flag (replacing --row-split) with a tensor option that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend, tensor and row fall back to graph.
Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
Add support for standalone .jinja/.jinja2 instruction template files in the UI, in addition to the existing .yaml format (#7517).

Bug fixes

Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
Fix extension settings not saving for extensions inside user_data/extensions (#7525).

Dependency updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
Update transformers to 5.6

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs

View release on GitHub

v4.7.1 New feature 2mo

Notable features

Native portable builds bundle Electron; launch via `textgen`/`textgen.bat`, optional `--listen` / `--nowebui` flags.
UI redesign: Inter font, Lucide SVG icons, segmented chat mode control, rounded input card with accent send button, flat underline active tab indicator, hairline sidebar handles.
Tensor parallelism via new `--split-mode tensor` flag for up to 60%+ multi‑GPU speedup on llama.cpp (fallback to graph on ik_llama.cpp).

Full changelog

Changes

Native desktop app: Portable builds now bundle Electron and open as a native window. Run textgen / textgen.bat instead of the previous start scripts. Pass --listen or --nowebui to skip the window and run the server directly.
Major UI overhaul:
- Replace Noto Sans with Inter as the default font.
- Replace emoji refresh/save/delete buttons with Lucide SVG icons.
- Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
- Redesign the chat input as a single rounded card with a circular accent-colored send button.
- Use a flat underline for the active tab indicator.
- Replace the sidebar toggle buttons with 3px hairline handles on desktop.
Tensor parallelism for llama.cpp: New --split-mode flag (replacing --row-split) with a tensor option that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend, tensor and row fall back to graph.
Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
Add support for standalone .jinja/.jinja2 instruction template files in the UI, in addition to the existing .yaml format (#7517).

Bug fixes

Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
Fix extension settings not saving for extensions inside user_data/extensions (#7525).

Dependency updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
Update transformers to 5.6

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs

View release on GitHub

Text Generation Web UI

Features

Security Response History

Recent releases

Changes

Bug fixes

Dependency updates

Portable builds

Windows

Linux

macOS

Updating a portable install:

Changes

Bug fixes

Dependency updates

Portable builds

Windows

Linux

macOS

Updating a portable install:

Changes

Bug fixes

Dependency updates

Portable builds

Windows

Linux

macOS

Updating a portable install:

Changes

Bug fixes

Dependency updates

Portable builds

Windows

Linux

macOS

Updating a portable install:

About

Install & Platforms

Similar tools

Alternative to