Skip to content

Text Generation Web UI

LLM Frameworks

A desktop app for running local LLMs offline with chat, vision, tool‑calling, and API compatibility

Python Latest v4.9 · 14d ago Security brief →

Features

  • Local chat & instruction‑following (LLM) interface with OpenAI/Anthropic API drop‑in support
  • Multimodal vision input for image understanding
  • Tool‑calling capability to invoke custom Python functions (e.g., web search, math)
  • Training tab for LoRA fine‑tuning on chat or raw text datasets
  • Image generation tab with diffusers models and quantization options

Security Response History

1 CVE
CVE Severity Disclosed Patched (this tool) vs Ecosystem Median
CVE-2023-4863 KEV high
CVSS 8.8
2023-09-13 2026-01-08 2y 4mo / median 2y 4mo

Recent releases

View all 22 releases →
Config change
v4.9 Mixed
Auth RCE / SSRF

MTP, Web search snippets, Electron UI

v4.8 New feature patches CVE-2023-4863
⚠ Upgrade required
  • Use cuda13.1 build if `nvidia-smi` reports CUDA Version >= 13.1; otherwise use cuda12.4
  • ik_llama.cpp offers new quant types – choose based on preference
  • Portable builds now support Windows, Linux, macOS with specific GPU/ROCm/CPU variants
Notable features
  • Redesigned chat composer: taller input area with paperclip and action buttons pinned to bottom (Gemini/DeepSeek style)
  • Smooth scroll animation when sending a new message
  • Electron improvements – persist window bounds, add --no-electron flag, disable spellcheck in chat input
Full changelog

Changes

  • Redesigned chat composer: Taller input area with the paperclip and message-action buttons pinned to the bottom, similar to Gemini and DeepSeek.
  • Smooth scroll animation when sending a new message: Inspired by Gemini's chat UI.
  • Electron improvements:
    • Persist window bounds and maximize state across launches.
    • Add a --no-electron flag to skip the desktop window and use the web UI in the browser instead.
    • Disable spellcheck in the chat input.
  • API: Add support for list-format content in tool and assistant messages.
  • Add more space below the last chat/chat-instruct message so its action buttons have breathing room.

Bug fixes

  • Fix speculative decoding broken by upstream llama.cpp arg renames (#7541).
  • Fix truncation length reverting after model load on UI reload (#7540).
  • Don't clear the chat input when sending a message with no model loaded (#7542).
  • Electron:
    • Fix big character picture failing to load (#7540).
    • Fix --listen mode in the launcher.
    • Fix missing log colors on Windows.

Dependency updates

  • Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/68380ae11b564af67196afc70f10c99dbb532fa9
  • Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9a26522af234f8db079ae3735f35ab6c20fe2c66

Portable builds

TextGen is now a desktop app for local LLMs. Download, unzip, double-click.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (817 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (319 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (396 MB) | — |
| CPU only | Download (307 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs
v4.7.3 Breaking risk
⚠ Upgrade required
  • When upgrading portable builds, replace the entire executable with the new `textgen`/`textgen.bat` and adjust any scripts that called the old start script.
  • Existing configurations using `--row-split` must be migrated to `--split-mode tensor` (or another supported mode).
  • Portable build selection now requires CUDA version check: use `cuda13.1` builds if `nvidia-smi` reports CUDA ≥ 13.1, otherwise use `cuda12.4`.
Breaking changes
  • Run `textgen` / `textgen.bat` executables instead of previous start scripts; old script invocation no longer works.
  • Flag `--row-split` removed, replaced by new `--split-mode` flag with a `tensor` option for multi‑GPU inference.
Notable features
  • Native desktop builds bundle Electron and launch as native windows (run via `textgen`/`textgen.bat`).
  • Tensor parallelism support added via `--split-mode tensor` in llama.cpp, boosting multi‑GPU performance by >60%.
  • UI overhaul: Inter font default, Lucide SVG icons for actions, segmented chat mode control, redesigned input card, flat underline tab indicator, hairline sidebar handles.
Full changelog

Changes

  • Native desktop app: Portable builds now bundle Electron and open as a native window. Run textgen / textgen.bat instead of the previous start scripts. Pass --listen or --nowebui to skip the window and run the server directly.
  • Major UI overhaul:
    • Replace Noto Sans with Inter as the default font.
    • Replace emoji refresh/save/delete buttons with Lucide SVG icons.
    • Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
    • Redesign the chat input as a single rounded card with a circular accent-colored send button.
    • Use a flat underline for the active tab indicator.
    • Replace the sidebar toggle buttons with 3px hairline handles on desktop.
  • Tensor parallelism for llama.cpp: New --split-mode flag (replacing --row-split) with a tensor option that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend, tensor and row fall back to graph.
  • Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
  • Add support for standalone .jinja/.jinja2 instruction template files in the UI, in addition to the existing .yaml format (#7517).

Bug fixes

  • Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
  • Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
  • Fix extension settings not saving for extensions inside user_data/extensions (#7525).

Dependency updates

  • Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
  • Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
  • Update transformers to 5.6

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs
v4.7.2 Breaking risk
⚠ Upgrade required
  • --listen or --nowebui can be used to skip the native window and run the server directly.
  • Portable builds require CUDA 13.1 when `nvidia-smi` reports CUDA Version ≥ 13.1; otherwise use the CUDA 12.4 build.
Breaking changes
  • Removed previous start script invocation; run `textgen` / `textgen.bat` to launch the Electron‑wrapped native window.
Notable features
  • Tensor parallelism flag `--split-mode tensor` for up to 60%+ faster multi‑GPU inference with llama.cpp.
  • Major UI overhaul: default font Inter, Lucide SVG icons, segmented chat mode control, redesigned input card, flat tab indicator, hairline sidebar handles.
Full changelog

Changes

  • Native desktop app: Portable builds now bundle Electron and open as a native window. Run textgen / textgen.bat instead of the previous start scripts. Pass --listen or --nowebui to skip the window and run the server directly.
  • Major UI overhaul:
    • Replace Noto Sans with Inter as the default font.
    • Replace emoji refresh/save/delete buttons with Lucide SVG icons.
    • Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
    • Redesign the chat input as a single rounded card with a circular accent-colored send button.
    • Use a flat underline for the active tab indicator.
    • Replace the sidebar toggle buttons with 3px hairline handles on desktop.
  • Tensor parallelism for llama.cpp: New --split-mode flag (replacing --row-split) with a tensor option that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend, tensor and row fall back to graph.
  • Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
  • Add support for standalone .jinja/.jinja2 instruction template files in the UI, in addition to the existing .yaml format (#7517).

Bug fixes

  • Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
  • Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
  • Fix extension settings not saving for extensions inside user_data/extensions (#7525).

Dependency updates

  • Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
  • Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
  • Update transformers to 5.6

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs
v4.7.1 New feature
Notable features
  • Native portable builds bundle Electron; launch via `textgen`/`textgen.bat`, optional `--listen` / `--nowebui` flags.
  • UI redesign: Inter font, Lucide SVG icons, segmented chat mode control, rounded input card with accent send button, flat underline active tab indicator, hairline sidebar handles.
  • Tensor parallelism via new `--split-mode tensor` flag for up to 60%+ multi‑GPU speedup on llama.cpp (fallback to graph on ik_llama.cpp).
Full changelog

Changes

  • Native desktop app: Portable builds now bundle Electron and open as a native window. Run textgen / textgen.bat instead of the previous start scripts. Pass --listen or --nowebui to skip the window and run the server directly.
  • Major UI overhaul:
    • Replace Noto Sans with Inter as the default font.
    • Replace emoji refresh/save/delete buttons with Lucide SVG icons.
    • Turn the chat mode selector (chat / chat-instruct / instruct) into a 3-button segmented control.
    • Redesign the chat input as a single rounded card with a circular accent-colored send button.
    • Use a flat underline for the active tab indicator.
    • Replace the sidebar toggle buttons with 3px hairline handles on desktop.
  • Tensor parallelism for llama.cpp: New --split-mode flag (replacing --row-split) with a tensor option that can make multi-GPU inference 60%+ faster. On the ik_llama.cpp backend, tensor and row fall back to graph.
  • Replace DuckDuckGo HTML scraping in the web search tool with the ddgs library, which is more robust against DuckDuckGo's bot blocking.
  • Add support for standalone .jinja/.jinja2 instruction template files in the UI, in addition to the existing .yaml format (#7517).

Bug fixes

  • Fix Stop button being ignored during tool call approval, and not interrupting between tool turns in multi-turn tool loops.
  • Fix race condition in the ExLlamaV3 backend that could affect concurrent API requests.
  • Fix extension settings not saving for extensions inside user_data/extensions (#7525).

Dependency updates

  • Update llama.cpp to https://github.com/ggml-org/llama.cpp/commit/09294365a9d7a2b786584d59525b034622c3ed81
  • Update ik_llama.cpp to https://github.com/ikawrakow/ik_llama.cpp/commit/9f1deefa7128889fd8a947964f04262bfa724b84
  • Update transformers to 5.6

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

[!NOTE]
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (891 MB) | Download (1.23 GB) |
| NVIDIA (CUDA 13.1) | Download (816 MB) | Download (1.33 GB) |
| AMD/Intel (Vulkan) | Download (336 MB) | — |
| AMD (ROCm 7.2) | Download (604 MB) | — |
| CPU only | Download (318 MB) | Download (334 MB) |

Linux

| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (848 MB) | Download (1.20 GB) |
| NVIDIA (CUDA 13.1) | Download (803 MB) | Download (1.32 GB) |
| AMD/Intel (Vulkan) | Download (324 MB) | — |
| AMD (ROCm 7.2) | Download (395 MB) | — |
| CPU only | Download (306 MB) | Download (334 MB) |

macOS

| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (271 MB) |
| Intel (x86_64) | Download (283 MB) |

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
47,261
Forks
5,977
Languages
Python CSS JavaScript

Install & Platforms

Install via
pip binary
Platforms
linux macos windows

Alternative to

OpenAI Anthropic

Beta — feedback welcome: [email protected]