Skip to content

Release history

Unsloth releases

All releases

25 shown

Upgrade now
v0.1.44-beta Mixed
Breaking upgrade

MCP tools, Chat UI, Projects, Canvas, Runtime

Upgrade now
v0.1.43-beta Mixed
Dependencies Breaking upgrade

Mac, Windows, CUDA, Blackwell, Studio updates

Review required
v0.1.42-beta Breaking risk
Auth RBAC RCE / SSRF

API calls + Studio security + language support

No immediate action
v0.1.41-beta Bug fix

Studio update fix + UX fixes

Review required
v0.1.40-beta Breaking risk
Auth RBAC

MTP speculative decoding

v0.1.38-beta Bug fix

Studio chat template no longer disappears after browser refresh.

Full changelog

You can use local LLMs with tools like Claude Code and Codex by connecting them to Unsloth’s API endpoint. This lets you run models like Qwen and Gemma locally, with additional features such as self-healing tool calling, code execution, and web search. Unsloth makes it easy to deploy a fast API inference endpoint that provides:

Models loaded in Unsloth (including GGUFs) are exposed as an authenticated API via llama-server. A long API key is generated for security reasons like how OpenAI provides one. Your local models can then be used directly in your preferred AI agent, SDK, or chat client. Unsloth speaks two dialects on the same port:

  • Anthropic-compatible /v1/messages for Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API.
  • OpenAI-compatible /v1/chat/completions and /v1/responses for the OpenAI SDK, OpenCode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool.
  • Both support streaming, tool calling (OpenAI tools / Anthropic tools), and vision inputs.

New models

We've also got a handful of new models to run including NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1 and Mistral 3.5 Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.

Unsloth Updates

  • Stopped Studio training runs can now resume from checkpoints.
  • Chat threads now autosave and persist more reliably.
  • DPO training hangs in multi-process setups were fixed.
  • VLM GRPO support improved with MROPE updates.
  • Studio’s stop button now properly stops generation.
  • Fix chat template disappearing after browser refresh

What's Changed in Unsloth

  • Studio: use (gguf) context length before max seq length by @G07cha in https://github.com/unslothai/unsloth/pull/5111
  • chore: fix typo cleanup across tests and backend strings by @luojiyin1987 in https://github.com/unslothai/unsloth/pull/5152
  • fix: guard resolve_model_class fallback against unresolvable transformers AutoModel entries by @Etherll in https://github.com/unslothai/unsloth/pull/5155
  • Studio: kill in-flight llama-server before spawning a new one by @danielhanchen in https://github.com/unslothai/unsloth/pull/5171
  • Studio: stop currency escape from breaking inline LaTeX by @danielhanchen in https://github.com/unslothai/unsloth/pull/5170
  • Studio: probe AMD GPUs in llama-server VRAM detection by @danielhanchen in https://github.com/unslothai/unsloth/pull/5172
  • Studio: make stop button actually stop generation by @danielhanchen in https://github.com/unslothai/unsloth/pull/5069
  • Studio: add github_repo seed reader and GitHub Support Bot recipe by @danielhanchen in https://github.com/unslothai/unsloth/pull/5169
  • fix(studio): use endswith for mmproj F16 variant selection by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/5184
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/5204
  • Fix Windows install when paths contain spaces or Python 3.14 is on PATH by @Etherll in https://github.com/unslothai/unsloth/pull/5201
  • Studio: Preserve transparency in uploaded profile avatars by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5200
  • UX: single chat header error placement and selector alignment by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5173
  • Studio: Refine chat preset and group built-in presets by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5159
  • Studio: Fix image-only chat requests failing validation by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5212
  • Studio: fix 7 failing studio_unit_tests on main by @danielhanchen in https://github.com/unslothai/unsloth/pull/5216
  • Patch checkpoint reload init functions to strip unsupported args by @Datta0 in https://github.com/unslothai/unsloth/pull/5167
  • Studio: Fix clipped model selector text descenders by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5210
  • Fix DPO trainer multi process hang by @Datta0 in https://github.com/unslothai/unsloth/pull/5199
  • Studio: Pin assistant-ui core for fresh installs by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5229
  • Fix local model scanner to handle ollama cloud models by @Anish9901 in https://github.com/unslothai/unsloth/pull/5220
  • Fix Studio desktop tray installer and titlebar and bux fixes by @wasimysaid in https://github.com/unslothai/unsloth/pull/5179
  • MROPE for VLM GRPO by @Datta0 in https://github.com/unslothai/unsloth/pull/5198
  • install: overlay unsloth-zoo from git main on --local by @rolandtannous in https://github.com/unslothai/unsloth/pull/5242
  • Studio: Fix chat template disappearing after browser refresh by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5209
  • studio: add --local to setup.sh + overlay unsloth-zoo from git main by @rolandtannous in https://github.com/unslothai/unsloth/pull/5252
  • Fix/windowsprebuilt by @mmathew23 in https://github.com/unslothai/unsloth/pull/5241
  • Studio: Add dataset upload dropzone and update preserve think copy by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5253
  • Add Qwen3.6 support by @rolandtannous in https://github.com/unslothai/unsloth/pull/5257
  • Studio: Chat thread autosave persistence by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5256
  • Studio: Enable deleting fine-tuned chat models by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5234
  • Studio: Add checkpoint resume for stopped training runs by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5255
  • Studio: Polish spacing and profile input radius by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5222
  • Fix check for libcurl headers in install.sh by @LFd3v in https://github.com/unslothai/unsloth/pull/5251
  • Default Studio host to 127.0.0.1 and prompt before auto-start by @rolandtannous in https://github.com/unslothai/unsloth/pull/5267
  • Studio: forward llama-server args from unsloth studio run , activate unsloth run , and allow passing model:quant to load models by @rolandtannous in https://github.com/unslothai/unsloth/pull/5271
  • Studio: Always show API usage examples and docs links by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5270
  • Studio: Change API Keys settings to API Access by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5268
  • unsloth run: add --enable-tools/--disable-tools server-side tool policy by @rolandtannous in https://github.com/unslothai/unsloth/pull/5277
  • fix: use % 8 instead of // 8 in FP8 weight shape check by @Ricardo-M-L in https://github.com/unslothai/unsloth/pull/5243
  • Pin Studio GGUF export to llama.cpp's local convert script by @mmathew23 in https://github.com/unslothai/unsloth/pull/5275
  • fix KVCache estimates for gemma4 style sliding window models by @Datta0 in https://github.com/unslothai/unsloth/pull/5225
  • Update VRAM estimator to cater to broader model configs by @Datta0 in https://github.com/unslothai/unsloth/pull/5175
  • Fix FastSentenceTransformer loading with newer sentence-transformers by @Etherll in https://github.com/unslothai/unsloth/pull/5259
  • Studio: Preserve chat history during autosave by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5278

What's changed in Unsloth-Zoo

  • Fix fused CE grad scaling under DDP by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/434
  • Fused CE backward: guard scaling=0, drop tensor path, use out-of-place mul by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/610
  • Fix/gemma4moefix by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/612
  • MROPE for VLM GRPO by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/614
  • Double-buffer GPU activations for overlapping H2D copy with backward compute by @ruixiang63 in https://github.com/unslothai/unsloth-zoo/pull/534
  • fix(temporary_patches/utils): add missing comma in all (raise_error / Unpack) by @Anai-Guo in https://github.com/unslothai/unsloth-zoo/pull/617
  • Fix qwen lora extractor for diff peft versions by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/618
  • fix: use backend device type in GGUF merge path by @andomeder in https://github.com/unslothai/unsloth-zoo/pull/615
  • Add unsloth_compiled_cache to gitignore by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/622
  • Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/621

Full Changelog: https://github.com/unslothai/unsloth/compare/v0.1.37-beta...v0.1.38-beta

v0.1.37-beta Breaking risk
Notable features
  • Collapsible sidebar
  • Chat deletion and search
  • Preserve Thinking toggle for compatible models
Full changelog

Hey guys, we revamped the entire Unsloth Studio UI and UX experience to put an emphasis on chat and training:

  • Added a collapsible sidebar based on community feedback
  • You can now delete chats and search past conversations
  • New Preserve Thinking toggle for models that support it like Qwen3.6
  • Cleaner, more consistent design with easier navigation
  • Expanded Settings page with options to change your profile picture, name, and more
  • No more entering your Hugging Face token twice
  • gpt-oss now has low, medium and high thinking toggles.
  • Now uses latest llama.cpp prebuilt, even on Linux CUDA
  • Lots of bug, consistency and stability fixes
  • Kimi-K2.6 can now be run!
  • We also added experimental API support. Guides, announcement etc will come next week.

Qwen3.6 was also also previously already supported in Unsloth Studio for running and training. You can train and run Qwen3.6-27B right now!

What's Changed

  • Only run ldconfig CUDA-linking recovery when we have permission by @danielhanchen in https://github.com/unslothai/unsloth/pull/4930
  • Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) by @cheehook in https://github.com/unslothai/unsloth/pull/4889
  • Fix raw text paragraph break normalization by @kiankyars in https://github.com/unslothai/unsloth/pull/4884
  • Studio: keep chat input visible and fix compare pane clipping by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4924
  • fix: check find() return value before adding offset in try_fix_tokenizer by @Ricardo-M-L in https://github.com/unslothai/unsloth/pull/4923
  • updated models template mappers. added lfm2.5vl450m to transformers 5… by @rolandtannous in https://github.com/unslothai/unsloth/pull/4939
  • Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" by @rolandtannous in https://github.com/unslothai/unsloth/pull/4945
  • Add AMD ROCm/HIP support across installer and hardware detection by @danielhanchen in https://github.com/unslothai/unsloth/pull/4720
  • Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4954
  • Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ by @danielhanchen in https://github.com/unslothai/unsloth/pull/4934
  • Add ROCm test suite (companion to #4720) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4824
  • updating gemma4 script by @Manan17 in https://github.com/unslothai/unsloth/pull/4992
  • Move gemma4 script by @Manan17 in https://github.com/unslothai/unsloth/pull/4994
  • studio: fix route transition DOM duplication via AnimatePresence mode="wait" by @AdamPlatin123 in https://github.com/unslothai/unsloth/pull/4987
  • Studio: Prompt manager, message deletion, and chat UI improvements by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4938
  • Pin kernels==0.12.1 to fix training import failure by @rolandtannous in https://github.com/unslothai/unsloth/pull/5000
  • Studio: Expose openai and anthropic compatible external API end points by @danielhanchen in https://github.com/unslothai/unsloth/pull/4956
  • studio: skip training status/metrics polling when idle by @AdamPlatin123 in https://github.com/unslothai/unsloth/pull/4988
  • studio: fix api-keys access + refresh by @wasimysaid in https://github.com/unslothai/unsloth/pull/5005
  • Studio: Polish API key copy button and harden async clipboard fallback by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5006
  • fix(studio): default chart view to full training history by @Barath19 in https://github.com/unslothai/unsloth/pull/5007
  • [Studio] Show non exported models in chat UI by @Datta0 in https://github.com/unslothai/unsloth/pull/4892
  • [Studio] Install flash attn at setup time for linux by @Datta0 in https://github.com/unslothai/unsloth/pull/4979
  • fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) by @TF-MTGE in https://github.com/unslothai/unsloth/pull/4922
  • Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM by @danielhanchen in https://github.com/unslothai/unsloth/pull/5011
  • Studio: make GGUF disk-space preflight cache-aware by @danielhanchen in https://github.com/unslothai/unsloth/pull/5012
  • Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM by @danielhanchen in https://github.com/unslothai/unsloth/pull/5014
  • studio: show HF model download progress in training start overlay by @danielhanchen in https://github.com/unslothai/unsloth/pull/4894
  • studio: stream export worker output into the export dialog by @danielhanchen in https://github.com/unslothai/unsloth/pull/4897
  • Fix num_items_in_batch GA for Gemma4 by @Datta0 in https://github.com/unslothai/unsloth/pull/4998
  • studio: pin peft to 0.18.1 to fix export subprocess issues by @rolandtannous in https://github.com/unslothai/unsloth/pull/5015
  • Studio: live model-load progress + rate/ETA on download and load by @danielhanchen in https://github.com/unslothai/unsloth/pull/5017
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/5004
  • Fix bitsandbytes ROCm install by using pip instead of uv by @edamamez in https://github.com/unslothai/unsloth/pull/4966
  • Studio: split model-load progress label across two rows by @danielhanchen in https://github.com/unslothai/unsloth/pull/5020
  • Studio: hard-stop at n_ctx with a 'Context limit reached' toast by @danielhanchen in https://github.com/unslothai/unsloth/pull/5021
  • [moe][gemma4] Target MoE for gemma4 by @Datta0 in https://github.com/unslothai/unsloth/pull/4913
  • Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var by @rolandtannous in https://github.com/unslothai/unsloth/pull/5024
  • Studio: support GGUF variant selection for non-suffixed repos by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5023
  • fix: prevent offline freeze by fixing stats retry and forwarding local_files_only by @DavidSolanas in https://github.com/unslothai/unsloth/pull/5016
  • Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) by @danielhanchen in https://github.com/unslothai/unsloth/pull/5034
  • fix(rocm): tighten gfx regex to ignore generic ISA lines by @danielhanchen in https://github.com/unslothai/unsloth/pull/5033
  • Fix grad-accum accepts_loss_kwargs detection for vision wrappers by @danielhanchen in https://github.com/unslothai/unsloth/pull/5036
  • grpo_compute_loss_slow called with wrong positional args by @jonahsamost in https://github.com/unslothai/unsloth/pull/4887
  • Gate trl disable_gradient_checkpointing warning on UNSLOTH_ENABLE_LOGGING by @danielhanchen in https://github.com/unslothai/unsloth/pull/5038
  • Studio: refresh Downloaded GGUF list and recurse into variant subdirs by @danielhanchen in https://github.com/unslothai/unsloth/pull/5032
  • feat: Add support for OLMo-3 model by @OnePunchMonk in https://github.com/unslothai/unsloth/pull/4678
  • feat: Add cactus QAT scheme support by @OnePunchMonk in https://github.com/unslothai/unsloth/pull/4679
  • Re-apply #4939: updated models template mappers by @rolandtannous in https://github.com/unslothai/unsloth/pull/4950
  • Studio: add folder browser modal for Custom Folders by @danielhanchen in https://github.com/unslothai/unsloth/pull/5035
  • Bump Studio installer minimum to 2026.4.5 by @danielhanchen in https://github.com/unslothai/unsloth/pull/5041
  • fix Gemma4 flash attn disable by @mmathew23 in https://github.com/unslothai/unsloth/pull/5045
  • BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150) by @kimimgo in https://github.com/unslothai/unsloth/pull/4426
  • fix: use direct registry API for PATH writes instead of SetEnvironmentVariable by @Etherll in https://github.com/unslothai/unsloth/pull/4961
  • Chat-template repair: warn-by-default, AST classification, dict support by @danielhanchen in https://github.com/unslothai/unsloth/pull/5049
  • Restrict flash attn to <=256 head dim. Consolidate attn impl checks by @Datta0 in https://github.com/unslothai/unsloth/pull/5051
  • Remove legacy venv Scripts entry from User PATH on upgrade by @danielhanchen in https://github.com/unslothai/unsloth/pull/5060
  • Fix review findings for chat-template repair (#5049) by @danielhanchen in https://github.com/unslothai/unsloth/pull/5056
  • Studio: Ollama support, recommended folders, Custom Folders UX polish by @danielhanchen in https://github.com/unslothai/unsloth/pull/5050
  • feat(studio): replace navbar with collapsible sidebar by @wasimysaid in https://github.com/unslothai/unsloth/pull/4936
  • fix audio dataset preview and finetuning by @CodeMan62 in https://github.com/unslothai/unsloth/pull/5043
  • Chat first onboarding by @wasimysaid in https://github.com/unslothai/unsloth/pull/5063
  • Fix onboarding followups by @wasimysaid in https://github.com/unslothai/unsloth/pull/5064
  • Studio: Default Gemma fallback for chat + AI assist by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5066
  • fix: multi-GPU inference crash for bnb 4-bit/8-bit models by @danielhanchen in https://github.com/unslothai/unsloth/pull/5068
  • Add Qwen3.6 inference defaults for Studio by @danielhanchen in https://github.com/unslothai/unsloth/pull/5065
  • Add qwen3.6 script by @Manan17 in https://github.com/unslothai/unsloth/pull/5084
  • Studio: forward standard OpenAI tools / tool_choice to llama-server by @rolandtannous in https://github.com/unslothai/unsloth/pull/5099
  • fix(studio/chat): stop stream when trashing a thread from sidebar by @rolandtannous in https://github.com/unslothai/unsloth/pull/5067
  • Studio: Local profile customization in settings and sync sidebar identity by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5088
  • Studio: Show LoRA live logs and update GGUF quant options by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5058
  • Studio: prefer mainstream clipboard copy over deprecated one by @G07cha in https://github.com/unslothai/unsloth/pull/5109
  • Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5089
  • Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) by @rolandtannous in https://github.com/unslothai/unsloth/pull/5122
  • Studio: support images on /v1/messages (Anthropic-compat) by @rolandtannous in https://github.com/unslothai/unsloth/pull/5128
  • Coerce TRL's tuple-cached _*_available flags to bool by @danielhanchen in https://github.com/unslothai/unsloth/pull/5129
  • Studio: Smoother thread switching in chat by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5126
  • Studio: Replace assistant UI shared autoscroll with per-panel scrolling by @Imagineer99 in https://github.com/unslothai/unsloth/pull/5127
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/5117
  • Fix tokenizer save gemma by @Datta0 in https://github.com/unslothai/unsloth/pull/5115
  • update gema4 chat templates by @Datta0 in https://github.com/unslothai/unsloth/pull/5116
  • Bump installer floor to 2026.4.7 by @danielhanchen in https://github.com/unslothai/unsloth/pull/5134
  • fix/llamacpp_prebuilt_install by @mmathew23 in https://github.com/unslothai/unsloth/pull/5135
  • Studio: fix stale test_exception_result_cached test for vision cache by @danielhanchen in https://github.com/unslothai/unsloth/pull/5145
  • fix: patch CONTROL type for special tokens in sentencepiece GGUF export by @octo-patch in https://github.com/unslothai/unsloth/pull/5080
  • fix(install): clear STUDIO_LOCAL_* env on POSIX normal install by @danielhanchen in https://github.com/unslothai/unsloth/pull/5146
  • Add tauri by @wasimysaid in https://github.com/unslothai/unsloth/pull/5144
  • Studio: detect reasoning_effort and preserve_thinking in chat templates by @danielhanchen in https://github.com/unslothai/unsloth/pull/5149

New Contributors

  • @cheehook made their first contribution in https://github.com/unslothai/unsloth/pull/4889
  • @Ricardo-M-L made their first contribution in https://github.com/unslothai/unsloth/pull/4923
  • @Barath19 made their first contribution in https://github.com/unslothai/unsloth/pull/5007
  • @TF-MTGE made their first contribution in https://github.com/unslothai/unsloth/pull/4922
  • @edamamez made their first contribution in https://github.com/unslothai/unsloth/pull/4966
  • @DavidSolanas made their first contribution in https://github.com/unslothai/unsloth/pull/5016
  • @jonahsamost made their first contribution in https://github.com/unslothai/unsloth/pull/4887
  • @kimimgo made their first contribution in https://github.com/unslothai/unsloth/pull/4426
  • @CodeMan62 made their first contribution in https://github.com/unslothai/unsloth/pull/5043
  • @G07cha made their first contribution in https://github.com/unslothai/unsloth/pull/5109
  • @octo-patch made their first contribution in https://github.com/unslothai/unsloth/pull/5080

Full Changelog: https://github.com/unslothai/unsloth/compare/v0.1.36-beta...v0.1.37-beta

v0.1.36-beta Bug fix
Notable features
  • Speculative decoding support
  • Gemma 4 training stability improvements
Full changelog

Hey everyone, we’ve updated Gemma 4 training and quants with many fixes. The bugs are universal and affected all packages and implementations and did NOT originate from Unsloth. We identified the bugs, fixed them, and Gemma 4 training now works properly only in Unsloth.

You need 8GB VRAM to train Gemma-4-E2B locally. Unsloth trains Gemma 4 ~1.5x faster with ~60% less VRAM than FA2 setups.

You can also train 26B-A4B and 31B or train via Unsloth Studio. Studio and the notebooks work for Vision, Text, Audio and inference.
For more details, guide + notebooks on training Gemma 4, view our blog: https://unsloth.ai/docs/models/gemma-4/train

Gemma 4 Training Fixes:

For fix details see our blog.

  1. Grad accumulation no longer causes losses to explode - before you might see losses of 300 to 400 - it should be 10 to 15 - Unsloth has this fixed.
  2. Index Error for 26B and 31B for inference - this will fail inference for 26B and 31B when using transformers - we fixed it.
  3. use_cache=False had gibberish for E2B, E4B - see https://github.com/huggingface/transformers/issues/45242
  4. float16 audio -1e9 overflows on float16

If you see losses higher than 13-15 (like 100 or 300) most likely gradient accumulation is not being accounted properly - we have fixed this as part of Unsloth and Unsloth Studio.

Gemma 4 Quant Re-uploads

We also updated our Gemma 4 GGUFs so you will need to re-download. Once again, the quant issues are NOT related to or originated from Unsloth:

  1. CUDA: check for buffer overlap before fusing - CRITICAL fixes <unused24> tokens https://github.com/ggml-org/llama.cpp/pull/21566
  2. kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
  3. vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
  4. convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
  5. common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
  6. llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
  7. llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406

Unsloth Studio Updates

  • Add speculative decoding support (ngram-mod, on by default)
  • Llama.cpp binaries updated to use latest version which includes all Gemma 4 Fixes
  • Fix Qwen3.5 and Gemma 4 training issues
  • Enable exporting and saving of Gemma 4 models
  • Harden sandbox security for terminal and python tools
  • Let recipes use the model loaded in Chat
  • Fix empty chat threads on navigation (and whenever switching tabs) and stabilize new chat flow
  • Allow non-LLM recipes to run and move Data tab first in executions
  • Reuse HF cached repo casing to prevent duplicate downloads

What's Changed

  • fix(studio): lazy-import transformers in model_config to fix 5.x version switch by @rolandtannous in https://github.com/unslothai/unsloth/pull/4806
  • fix: patch PEFT for Gemma4ClippableLinear in loader checkpoint path (fixes export) by @rolandtannous in https://github.com/unslothai/unsloth/pull/4807
  • Fix/gemma4 install script by @Manan17 in https://github.com/unslothai/unsloth/pull/4815
  • Fix/llama.cppbuilding by @mmathew23 in https://github.com/unslothai/unsloth/pull/4804
  • Add tests for simplified llama.cpp install policy (from PR #4804) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4817
  • Differentiate web search and URL fetch in chat tool UI by @Shine1i in https://github.com/unslothai/unsloth/pull/4802
  • Allow non-LLM recipes to run and move Data tab first in executions by @Shine1i in https://github.com/unslothai/unsloth/pull/4805
  • studio: reuse HF cached repo casing to prevent duplicate downloads by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4822
  • fix(studio): ensure first chat tool call starts in session sandbox by @neodon in https://github.com/unslothai/unsloth/pull/4810
  • fix(studio): harden sandbox security for terminal and python tools by @danielhanchen in https://github.com/unslothai/unsloth/pull/4827
  • studio: add speculative decoding support (ngram-mod, on by default) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4836
  • Add Gemma 4 model sampling defaults by @danielhanchen in https://github.com/unslothai/unsloth/pull/4838
  • Add tests for cache case resolution (from PR #4822) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4823
  • Bump minimum unsloth version to 2026.4.2 in install scripts by @danielhanchen in https://github.com/unslothai/unsloth/pull/4842
  • Fix/studio colab button message: Add fallback message for Colab Studio button when proxy URL fails by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/4866
  • [Studio][Optimization]Add vision detection cache to is_vision_model() by @rolandtannous in https://github.com/unslothai/unsloth/pull/4853
  • Add tests for is_vision_model() caching behaviour by @danielhanchen in https://github.com/unslothai/unsloth/pull/4855
  • Remove Gemma-4 from FORCE_FLOAT32 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4875
  • fix: skip redundant HfFileSystem().glob() calls in loader.py by @rolandtannous in https://github.com/unslothai/unsloth/pull/4852
  • fix(studio): custom folder scan fails to find GGUF variants when pointing directly at a model directory by @JYYYYYT in https://github.com/unslothai/unsloth/pull/4860
  • Add unit tests for loader glob skip guard (from PR #4852) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4854
  • Studio: Fix empty chat threads on navigation and stabilize new chat flow by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4872
  • Bump minimum unsloth version to 2026.4.4 in install scripts by @danielhanchen in https://github.com/unslothai/unsloth/pull/4876
  • split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code by @rolandtannous in https://github.com/unslothai/unsloth/pull/4878
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/4879
  • build(deps): bump oxc-parser from 0.121.0 to 0.123.0 in /studio/backend/core/data_recipe/oxc-validator in the npm-oxc-validator group by @dependabot[bot] in https://github.com/unslothai/unsloth/pull/4776
  • Update dependabot.yml by @danielhanchen in https://github.com/unslothai/unsloth/pull/4915
  • Let recipes use the model loaded in Chat by @Shine1i in https://github.com/unslothai/unsloth/pull/4840
  • build(deps): bump the bun-frontend group across 1 directory with 16 updates by @dependabot[bot] in https://github.com/unslothai/unsloth/pull/4586

New Contributors

  • @neodon made their first contribution in https://github.com/unslothai/unsloth/pull/4810
  • @JYYYYYT made their first contribution in https://github.com/unslothai/unsloth/pull/4860

Full Changelog: https://github.com/unslothai/unsloth/compare/v0.1.35-beta...v0.1.36-beta

v0.1.35-beta New feature
Notable features
  • Gemma 4 model support
  • Tool calling +30% to +80% accuracy improvement
  • Web search content retrieval
Full changelog

Google releases Gemma 4 with four new models: E2B, E4B, 26B-A4B, 31B.

  • You can now run and train the Gemma 4 models in Unsloth. Guide / Blog: https://unsloth.ai/docs/models/gemma-4
  • Run E2B and E4B on 6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.
  • GGUFs: https://huggingface.co/collections/unsloth/gemma-4

Updates

  • Tool calls for smaller models are now more stable and don't cut off anymore
  • Pre-compiled binaries for llama.cpp for 2 Gemma 4 fixes:
  • Pre-compiled binaries for Windows, Linux, Mac, WSL devices - CPU and GPU
  • 90% reduced HF API calls - less rate limits
  • Intel Mac works
  • All Gemma 4 models are re-converted.
  • Tool Calling more robust
  • Speculative Decoding added for non vision models (Gemma-4 is vision sadly and Qwen3.5)
  • Context length is now properly applied.
  • Tool calls for all models are now +30% to +80% more accurate.
  • Web search now actually gets web content and not just summaries
  • Number of tool calls allowed are increased to 25 from 10
  • Tool calls now terminate much better, so looping / repetitions will be reduced
  • More tool call healing and de-duplication logic to stop tool callings from leaking XML as well
  • Tested with unsloth/Qwen3.5-4B-GGUF (UD-Q4_K_XL), web search + code execution + thinking enabled.

| Metric | Before | After |
|--------|--------|-------|
| XML leaks in response | 10/10 | 0/10 |
| URL fetches used | 0 | 4/10 runs |
| Runs with correct song names | 0/10 | 2/10 |
| Avg tool calls | 5.5 | 3.8 |
| Avg response time | 12.3s | 9.8s |

Run Gemma 4 in Unsloth Studio:

What's Changed

  • studio: Polish Windows installer/setup logs by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4736
  • feat: move folder management into model selector dropdown by @Shine1i in https://github.com/unslothai/unsloth/pull/4731
  • fix: clear tool status badge immediately after tool execution by @Shine1i in https://github.com/unslothai/unsloth/pull/4733
  • refactor flex attn to prefer flash if possible by @Datta0 in https://github.com/unslothai/unsloth/pull/4734
  • Fix Windows local GGUF model loading crash by @danielhanchen in https://github.com/unslothai/unsloth/pull/4730
  • Fix OOM model styling in Studio model selectors by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/4738
  • feat(studio): strip org prefix in model search to surface unsloth variants by @rolandtannous in https://github.com/unslothai/unsloth/pull/4749
  • Fix forward compatibility with transformers 5.x by @danielhanchen in https://github.com/unslothai/unsloth/pull/4752
  • Architecture-aware KV cache VRAM estimation (5-path) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4757
  • Fix save_pretrained_merged for full-finetuned models by @danielhanchen in https://github.com/unslothai/unsloth/pull/4755
  • Feat/prebuiltllamacpp by @mmathew23 in https://github.com/unslothai/unsloth/pull/4741
  • Add installer test coverage for prebuilt llama.cpp changes by @danielhanchen in https://github.com/unslothai/unsloth/pull/4756
  • fix: studio web search SSL failures and empty page content by @danielhanchen in https://github.com/unslothai/unsloth/pull/4754
  • fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ by @danielhanchen in https://github.com/unslothai/unsloth/pull/4748
  • fix(studio): allow context length slider to reach model's native limit by @danielhanchen in https://github.com/unslothai/unsloth/pull/4746
  • Tests for architecture-aware KV cache estimation by @danielhanchen in https://github.com/unslothai/unsloth/pull/4760
  • Fix custom llama.cpp source builds and macos metal source builds by @mmathew23 in https://github.com/unslothai/unsloth/pull/4762
  • studio: align composer/code, unify fonts, and remove tool collapse jitter by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4763
  • fix(chat): correct loading text for cached models during inference by @AdamPlatin123 in https://github.com/unslothai/unsloth/pull/4764
  • fix(security): shell injection in GGML export conversion by @mateeaaaaaaa in https://github.com/unslothai/unsloth/pull/4768
  • Add regression test for shell injection fix in GGML conversion by @danielhanchen in https://github.com/unslothai/unsloth/pull/4773
  • fix(studio): prevent small models from stalling on tool-calling tasks by @danielhanchen in https://github.com/unslothai/unsloth/pull/4769
  • Add regression tests for custom llama prebuilt installer by @danielhanchen in https://github.com/unslothai/unsloth/pull/4772
  • Feat/custom llama prebuilt by @mmathew23 in https://github.com/unslothai/unsloth/pull/4771
  • studio: fix chat font changes leaking outside chat page by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4775
  • feat(studio): display images from Python tool execution in chat UI by @danielhanchen in https://github.com/unslothai/unsloth/pull/4778
  • ui improvement by @rolandtannous in https://github.com/unslothai/unsloth/pull/4781
  • UI Changes by @danielhanchen in https://github.com/unslothai/unsloth/pull/4782
  • fix(studio): improve tool-calling re-prompt for small models by @danielhanchen in https://github.com/unslothai/unsloth/pull/4783
  • Pin Gemma-4 transformers requirement to 5.5.0 stable by @danielhanchen in https://github.com/unslothai/unsloth/pull/4784
  • Switch llama.cpp default to mainline ggml-org by @danielhanchen in https://github.com/unslothai/unsloth/pull/4785
  • Use transformers v5.5-release branch, pin to 5.5.0 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4786
  • Fix: pin transformers==4.57.6 in main Studio venv by @danielhanchen in https://github.com/unslothai/unsloth/pull/4788
  • fix(studio): build llama.cpp from master for Gemma 4 support by @danielhanchen in https://github.com/unslothai/unsloth/pull/4790
  • fix name fixed name by @rolandtannous in https://github.com/unslothai/unsloth/pull/4791
  • fix(studio): prioritize curated defaults in Recommended model list by @danielhanchen in https://github.com/unslothai/unsloth/pull/4792
  • fix windows llama.cpp compile from source issue by @mmathew23 in https://github.com/unslothai/unsloth/pull/4793
  • fix(studio): pin llama.cpp to b8637 (Gemma 4 support) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4796
  • fix(studio): don't set trust_remote_code for Gemma 4 training by @danielhanchen in https://github.com/unslothai/unsloth/pull/4795
  • fix(studio): revert llama.cpp default tag to latest by @danielhanchen in https://github.com/unslothai/unsloth/pull/4797
  • fix(studio): suppress fatal error when ggml-org has no prebuilt manifest by @danielhanchen in https://github.com/unslothai/unsloth/pull/4799

New Contributors

  • @AdamPlatin123 made their first contribution in https://github.com/unslothai/unsloth/pull/4764
  • @mateeaaaaaaa made their first contribution in https://github.com/unslothai/unsloth/pull/4768

Full Changelog: https://github.com/unslothai/unsloth/compare/v0.1.3-beta...v0.1.35-beta

v0.1.3-beta New feature
Notable features
  • Custom GGUF folder scanning
  • Automatic multi-GPU support
  • Tool calling XML deduplication
Full changelog

We did many new improvements and fixes to Studio!

  • Tool calls for all models are now +30% to +80% more accurate.
  • Web search now actually gets web content and not just summaries
  • Number of tool calls allowed are increased to 25 from 10
  • Tool calls now terminate much better, so looping / repetitions will be reduced
  • More tool call healing and de-duplication logic to stop tool callings from leaking XML as well
  • Tested with unsloth/Qwen3.5-4B-GGUF (UD-Q4_K_XL), web search + code execution + thinking enabled.

| Metric | Before | After |
|--------|--------|-------|
| XML leaks in response | 10/10 | 0/10 |
| URL fetches used | 0 | 4/10 runs |
| Runs with correct song names | 0/10 | 2/10 |
| Avg tool calls | 5.5 | 3.8 |
| Avg response time | 12.3s | 9.8s |

New features

  • Update button now visible
  • Install script styling all updated!
  • Added custom folders so you can use any GGUFs in any folder - for now access in Advanced Settings in Chat and Custom Folders
  • Preliminary Automatic Multi GPU support for inference and training - useful for large models that don't fit on 1 GPU - Studio auto will allocate GPU resources
  • Intel Macs should work out of the box

Much smoother and faster Studio

  • Fixed timeouts of downloads of large models - no more timeouts seen.
  • Fixed Hugging Face rate limiting - HF API calls reduced by 90%
  • Fixed bun on Windows and faster installs

To update Studio:

  1. For Linux, WSL, Mac, do: unsloth studio update
  2. For Windows native, do: irm https://unsloth.ai/install.ps1 | iex
  3. For Linux, WSL, Mac reinstalls, do: curl -fsSL https://unsloth.ai/install.sh | sh

What's Changed

  • Fix LM Studio GGUF loading on native Windows (no GPU) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4665
  • studio: add HF/local model selection UI for GGUF export by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4365
  • Fix blank page on Windows due to broken .js MIME type by @rolandtannous in https://github.com/unslothai/unsloth/pull/4674
  • fix: [Studio] setup.ps1 update-flow for windows by @rolandtannous in https://github.com/unslothai/unsloth/pull/4667
  • studio: unify Windows installer/setup logging style, verbosity controls, and startup messaging by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4651
  • studio: preserve GGUF context max after apply and refresh by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4691
  • [Studio] multi gpu finetuning/inference via "balanced_low0/sequential" device_map by @Datta0 in https://github.com/unslothai/unsloth/pull/4602
  • Fix editable install scanning 6,500+ node_modules dirs by @danielhanchen in https://github.com/unslothai/unsloth/pull/4697
  • fix(studio): avoid UnicodeEncodeError on Windows cp1252 consoles by @danielhanchen in https://github.com/unslothai/unsloth/pull/4699
  • Fix/bun windows bin detection by @Etherll in https://github.com/unslothai/unsloth/pull/4703
  • fix: skip download progress polling for exported GGUF models by @rolandtannous in https://github.com/unslothai/unsloth/pull/4709
  • [Studio] Fix: replace hard timeout with inactivity timeout for model loading by @rolandtannous in https://github.com/unslothai/unsloth/pull/4707
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/4705
  • studio: prevent false multimodal warning during model loading by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4704
  • fix(studio): open tour ReadMore links in new tab by @danielhanchen in https://github.com/unslothai/unsloth/pull/4694
  • [studio] multi gpu: revert to balanced for inference. by @Datta0 in https://github.com/unslothai/unsloth/pull/4698
  • fix: throttle and cache HuggingFace modelInfo API calls by @Shine1i in https://github.com/unslothai/unsloth/pull/4696
  • fix(studio): correct default weight_decay and learning rate by @danielhanchen in https://github.com/unslothai/unsloth/pull/4695
  • fix: auto-retry stalled HF downloads with HF_HUB_DISABLE_XET=1 by @rolandtannous in https://github.com/unslothai/unsloth/pull/4712
  • studio: add update button to navbar with guided commands and cross-platform support by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4721
  • studio: improve GGUF tool calling accuracy and reliability by @danielhanchen in https://github.com/unslothai/unsloth/pull/4700
  • studio: fix export HF model dropdown clearing on enter/click-away by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4726
  • Studio: simplify tool-call dedup and replace html2text with builtin converter by @danielhanchen in https://github.com/unslothai/unsloth/pull/4722
  • feat: custom scan folders for GGUF model discovery by @Shine1i in https://github.com/unslothai/unsloth/pull/4723
  • Bump installer minimum version pin to 2026.3.18 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4729

Full Changelog: https://github.com/unslothai/unsloth/compare/v0.1.25-beta...v0.1.3-beta

v0.1.25-beta New feature
Notable features
  • 20-30% faster inference
  • Model auto-detection from LM Studio/HuggingFace
  • Training history viewer
Full changelog

Hey guys, it's only been 2 days since our last release, but we’ve got a lot more important updates:

  • Inference is now 20–30% faster. Previously, tool-calling and repeat penalty could slow inference below normal speeds. Inference tokens/s should now perform similar to llama-server / llama.cpp.
  • Now Auto-detects older or pre-existing models downloaded from LM Studio, Hugging Face, and similar sources.
  • Inference token/s speed is now calculated correctly. Previously, tokens/s included startup time, which made the displayed speed look slower than it actually was. It should now reflect 'true' inference speed.
  • CPU usage no longer spikes. Previously, inline querier identity changed every render, causing useLiveQuery to resubscribe continuously.
  • Unsloth Studio now has a shutdown x button and shuts down properly. Previously, closing it after opening from the desktop icon would not close it properly. Now, launching from the shortcut also opens the terminal, and closing that terminal fully exits Unsloth Studio. If you still have it open from a previous session you can restart your computer or run lsof -i :8888 then kill -9 <PID>.
  • Even better tool-calling and websearch with reduced errors.
  • Updated documentation with lots of new info on deleting models, uninstalling etc.
  • Cleaner, smarter install and setup logging across Windows and Linux. Output is now easier to read with consistent formatting, quieter by default for a smoother experience, and supports richer --verbose diagnostics when you want full technical detail.
    {% endupdate %}
  • You can now view your training history

What's Changed

  • Bump installer min version to 2026.3.12 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4600
  • Fix Colab Studio launch and setup.ps1 box alignment by @danielhanchen in https://github.com/unslothai/unsloth/pull/4601
  • Fix Colab huggingface-hub conflict, ensurepip fallback, bump to 2026.3.14 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4603
  • Update README.md by @rolandtannous in https://github.com/unslothai/unsloth/pull/4604
  • fix: skip flex_attention for models with non-zero attention_dropout by @Abhinavexists in https://github.com/unslothai/unsloth/pull/4605
  • Fix Colab setup skipping llama.cpp installation by @rolandtannous in https://github.com/unslothai/unsloth/pull/4618
  • fix: show recommended models in search results by @Shine1i in https://github.com/unslothai/unsloth/pull/4615
  • studio: align Dataset/Parameters/Training cards, fix expandable height, animate LoRA settings by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4614
  • fix: Windows installer fails on _yaml.pyd Access Denied (os error 5) by @Etherll in https://github.com/unslothai/unsloth/pull/4617
  • studio: humanize ETA display for long training runs by @RadouaneElhajali in https://github.com/unslothai/unsloth/pull/4608
  • fix: add python-json-logger to data-designer-deps by @Shine1i in https://github.com/unslothai/unsloth/pull/4627
  • [Studio] Colab fix - Allow install_python_stack to run on Colab by @rolandtannous in https://github.com/unslothai/unsloth/pull/4633
  • Fix repetition_penalty default causing 24% TPS drop in GGUF inference by @danielhanchen in https://github.com/unslothai/unsloth/pull/4634
  • fix: install.sh Mac Intel compatibility + Studio no-torch support by @danielhanchen in https://github.com/unslothai/unsloth/pull/4624
  • tests: add no-torch / Intel Mac test suite by @danielhanchen in https://github.com/unslothai/unsloth/pull/4646
  • fix: use unsloth[huggingfacenotorch] instead of --no-deps in no-torch mode by @danielhanchen in https://github.com/unslothai/unsloth/pull/4647
  • Fix Gemma3N audio training stride assertion with non-reentrant checkpointing by @danielhanchen in https://github.com/unslothai/unsloth/pull/4629
  • Fix missing num_items_in_batch in unsloth_prediction_step by @danielhanchen in https://github.com/unslothai/unsloth/pull/4616
  • Make Studio shortcuts launch in a visible terminal by @danielhanchen in https://github.com/unslothai/unsloth/pull/4638
  • studio: setup log styling by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4494
  • Fix ~1.2s TTFT penalty when tools are enabled in Studio by @danielhanchen in https://github.com/unslothai/unsloth/pull/4639
  • Fix GGUF GPU fit check to account for KV cache VRAM by @danielhanchen in https://github.com/unslothai/unsloth/pull/4623
  • feat: update app icons to rounded logo by @Shine1i in https://github.com/unslothai/unsloth/pull/4640
  • Streaming tool detection: guard late tool_calls, filter incomplete fragments by @danielhanchen in https://github.com/unslothai/unsloth/pull/4648
  • fix: install no-torch runtime deps via requirements file by @danielhanchen in https://github.com/unslothai/unsloth/pull/4649
  • Fix orphan server cleanup killing user's own llama-server by @danielhanchen in https://github.com/unslothai/unsloth/pull/4622
  • fix: add auth + UX improvements to shutdown button by @Shine1i in https://github.com/unslothai/unsloth/pull/4642
  • Fix inference failing for transformers 5.x models (trust_remote_code) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4652
  • fix: no-torch install deps without pulling torch transitively by @danielhanchen in https://github.com/unslothai/unsloth/pull/4650
  • Detect always-on reasoning models and show Think button as locked-on by @danielhanchen in https://github.com/unslothai/unsloth/pull/4654
  • fix: replace navbar shutdown text button with icon-only button by @Shine1i in https://github.com/unslothai/unsloth/pull/4655
  • Fall back to parsing model name when HF API has no param count by @danielhanchen in https://github.com/unslothai/unsloth/pull/4656
  • fix: disable OCR in pymupdf4llm PDF extraction by @Shine1i in https://github.com/unslothai/unsloth/pull/4659
  • Fix HF cache default and show LM Studio models in chat/inference by @rolandtannous in https://github.com/unslothai/unsloth/pull/4653
  • Bump minimum unsloth version to 2026.3.16 in install scripts by @danielhanchen in https://github.com/unslothai/unsloth/pull/4663

New Contributors

  • @Abhinavexists made their first contribution in https://github.com/unslothai/unsloth/pull/4605
  • @RadouaneElhajali made their first contribution in https://github.com/unslothai/unsloth/pull/4608

Full Changelog: https://github.com/unslothai/unsloth/compare/v0.1.2-beta...v0.1.25-beta

v0.1.2-beta Breaking risk
Notable features
  • In-place Studio update command
  • App shortcuts for Windows/Mac/Linux
  • Pre-compiled llama.cpp binaries
Full changelog

Hey guys, this is our first release since we launched Unsloth Studio last week. From now on you can directly access all our updates through our changelog here: https://unsloth.ai/docs/new/changelog

You can now update Unsloth Studio! Just use: unsloth studio update. Please update to use all the newest fixes and features.

  • Tool calling improved. Better llama.cpp parsing, no raw tool markup in chat, faster inference, a new Tool Outputs panel, timers.
  • Windows CPU or GPU now works seamlessly. Please reinstall!
  • App shortcuts. Once installed, you can now launch in Windows, MacOS and Linux via a shortcut icon in the Start / Launch and Desktop.
  • Pre-compiled llama.cpp binaries and mamba_ssm for finetuning - 6x faster installs! Also <300MB in size for binaries.
  • 50% reduced installation sizes (-7GB or more savings), 2x faster installs and faster resolving. 50% smaller pypi sizes.
  • Colab with free T4 GPUs with Unsloth Studio now fixed! Try it here. Due to pre-compiled binaries, it's also 20x faster!
  • You can now properly use old GGUFs from Hugging Face or LM Studio
  • MacOS and CPU now have Data Recipes enabled with multi-file uploading.
  • AMD support preliminary for Linux only machines - auto detects.
  • Settings sidebar redesign. Settings are now grouped into Model, Sampling, Tools, and Preferences
  • Context length now adjustable. Keep in mind this is not needed as llama.cpp smartly uses the exact context you need via --fit on
  • Persistent system prompts and presets. Custom system prompts and chat presets now persist across reloads and page changes.
  • Multi-file upload. Data recipes now support multiple drag-and-drop uploads for PDF, DOCX, TXT, and MD, with backend extraction, saved uploads, and improved previews.
  • Better chat observability. Studio now shows llama-server timings and usage, a context-window usage bar, and richer source hover cards.
  • Better UX overall - clickable links, better LaTeX parsing, tool / code / web tooltips for default cards and much more!
  • LiteLLM - Unsloth Studio and Unsloth were NOT affected by the recent LiteLLM compromise. Nemo Data Designer used LiteLLM only up to 1.80, not the affected 1.82.7 or 1.82.8, and has since removed it entirely.
  • We now have a new one line install command, just run: Copycurl -fsSL https://unsloth.ai/install.sh | sh

Fixes:

  • Windows/setup improvements. Fixed silent Windows exits, Anaconda/conda-forge startup crashes, broken non-NVIDIA Windows installs, and missing early CUDA/stale-venv setup checks.
  • System prompts fixed. They work again for non-GGUF text and vision inference.
  • GGUF export expanded. Full fine-tunes, not just LoRA/PEFT, can now export to GGUF. Base model resolution is more reliable, and unsupported export options are disabled in the UI.
  • Chat scroll/layout fixes. Fixed scroll-position issues during generation, thinking-panel layout shift, and viewport jumps when collapsing reasoning panels.
  • Smarter port conflict detection. Studio now detects loopback conflicts, can identify the blocking process when possible, and gives clearer fallback-port messages.

Example of automatic parameter settings for context length etc:

https://github.com/user-attachments/assets/6a70a680-fccd-4d50-ad47-eb45d6827a06

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/4542
  • fix: store embedding_learning_rate on self in UnslothTrainingArguments by @GoldenGrapeGentleman in https://github.com/unslothai/unsloth/pull/4531
  • studio: persist system prompt and preset settings across navigation by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4538
  • studio: stop scroll hijack during generation and fix thinking panel layout shift by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4543
  • Fix Studio port conflict detection for loopback addresses by @danielhanchen in https://github.com/unslothai/unsloth/pull/4532
  • fix(studio): show Windows-specific reset-password command by @Shine1i in https://github.com/unslothai/unsloth/pull/4529
  • fix(studio): restore scroll lock on reasoning panel collapse by @danielhanchen in https://github.com/unslothai/unsloth/pull/4545
  • fix: always show chat tool icons by @Shine1i in https://github.com/unslothai/unsloth/pull/4525
  • fix: system prompt ignored in unsloth inference by @Shine1i in https://github.com/unslothai/unsloth/pull/4528
  • fix: handle prompt/completion datasets in slow-path BOS detection by @danielhanchen in https://github.com/unslothai/unsloth/pull/4548
  • fix: give @0xKushwaha git history credit for completion_only_loss fix by @danielhanchen in https://github.com/unslothai/unsloth/pull/4552
  • ⚠️Remove quarantined litellm for precaution -- Unsloth Studio NOT affected by @danielhanchen in https://github.com/unslothai/unsloth/pull/4553
  • fix: pin unsloth>=2026.3.11 in install scripts by @danielhanchen in https://github.com/unslothai/unsloth/pull/4556
  • Regroup chat settings sidebar into focused sections by @Shine1i in https://github.com/unslothai/unsloth/pull/4551
  • Add GRPO resume vLLM cleanup guard by @MagellaX in https://github.com/unslothai/unsloth/pull/4411
  • fix: prevent UnicodeEncodeError on Windows CP1252 consoles in studio setup by @Krishnachaitanyakc in https://github.com/unslothai/unsloth/pull/4563
  • studio: windows desktop shortcut launcher by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4558
  • Remove duplicate frontend assets from wheel (~31 MB savings) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4567
  • feat(studio): training history persistence and past runs viewer by @Shine1i in https://github.com/unslothai/unsloth/pull/4501
  • fix: remove auto wandb.finish() after train() to allow post-training evaluate() by @Krishnachaitanyakc in https://github.com/unslothai/unsloth/pull/4564
  • feat: Implement Q-GaLore optimizer and custom embedding learning rate… by @OnePunchMonk in https://github.com/unslothai/unsloth/pull/4511
  • Bump Data Designer to 0.5.4 (removes litellm dependency) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4569
  • feat(chat): cleaner tool UI, inline LaTeX, clickable links by @Shine1i in https://github.com/unslothai/unsloth/pull/4561
  • [Studio] Try installing causal-conv1d from prebuilt wheels if avialable by @Datta0 in https://github.com/unslothai/unsloth/pull/4547
  • Feature/add dependabot and codeql security checks by @pkloehn1 in https://github.com/unslothai/unsloth/pull/4479
  • build(deps): bump the actions group with 2 updates by @dependabot[bot] in https://github.com/unslothai/unsloth/pull/4570
  • build(deps): bump oxc-parser from 0.116.0 to 0.121.0 in /studio/backend/core/data_recipe/oxc-validator in the npm-oxc-validator group by @dependabot[bot] in https://github.com/unslothai/unsloth/pull/4571
  • Remove advanced CodeQL workflow (conflicts with default setup) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4584
  • Add macOS and Linux desktop shortcuts to install.sh by @danielhanchen in https://github.com/unslothai/unsloth/pull/4568
  • perf(studio): upgrade to Vite 8 + auto-install bun for faster frontend builds by @Etherll in https://github.com/unslothai/unsloth/pull/4522
  • feat(tokenizer): add get_tokenizer_info() diagnostic helper by @cz-03 in https://github.com/unslothai/unsloth/pull/4436
  • Add ROCm (AMD GPU) support to studio setup by @danielhanchen in https://github.com/unslothai/unsloth/pull/4585
  • Consolidate dual venvs and separate install from update by @rolandtannous in https://github.com/unslothai/unsloth/pull/4530
  • studio: stabilize reasoning panel scroll behavior and prevent composer overlap by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4587
  • Use prebuilt llama.cpp for unsloth studio setup by @mmathew23 in https://github.com/unslothai/unsloth/pull/4562
  • fix(studio): add -ngl flag for GPU offloading in llama-server by @danielhanchen in https://github.com/unslothai/unsloth/pull/4588
  • fix(studio): add pip nvidia CUDA libs to LD_LIBRARY_PATH for llama-server by @danielhanchen in https://github.com/unslothai/unsloth/pull/4590
  • fix(studio): validate bun install and retry from official source on failure by @danielhanchen in https://github.com/unslothai/unsloth/pull/4589
  • fix(studio): clear bun cache on failure and retry before falling back to npm by @danielhanchen in https://github.com/unslothai/unsloth/pull/4594
  • Pin torch>=2.4,<2.11.0 in Studio installers by @danielhanchen in https://github.com/unslothai/unsloth/pull/4595
  • fix(studio): source-build fallback prefers Unsloth's tested tag over upstream latest by @danielhanchen in https://github.com/unslothai/unsloth/pull/4593
  • fix(studio): add bun cache validation to Windows setup.ps1 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4596
  • feat: multi-source model discovery (HF default, legacy cache, LM Studio) by @rolandtannous in https://github.com/unslothai/unsloth/pull/4591
  • Add unsloth to User PATH on Windows after install by @danielhanchen in https://github.com/unslothai/unsloth/pull/4597
  • Add PID file tracking and unsloth studio stop command by @danielhanchen in https://github.com/unslothai/unsloth/pull/4598
  • feat(studio): editable context length with Apply/Reset for GGUF settings by @danielhanchen in https://github.com/unslothai/unsloth/pull/4592

New Contributors

  • @MagellaX made their first contribution in https://github.com/unslothai/unsloth/pull/4411
  • @Krishnachaitanyakc made their first contribution in https://github.com/unslothai/unsloth/pull/4563
  • @OnePunchMonk made their first contribution in https://github.com/unslothai/unsloth/pull/4511
  • @pkloehn1 made their first contribution in https://github.com/unslothai/unsloth/pull/4479
  • @dependabot[bot] made their first contribution in https://github.com/unslothai/unsloth/pull/4570
  • @cz-03 made their first contribution in https://github.com/unslothai/unsloth/pull/4436

Full Changelog: https://github.com/unslothai/unsloth/compare/b8475...v0.1.2-beta

b8475 Feature
Notable features
  • Install-ready llama.cpp bundles for Unsloth Studio
Changelog

Install-ready Unsloth Studio llama.cpp bundles for b8475.

b8457 Feature
Notable features
  • Install-ready llama.cpp bundles for streamlined setup
Changelog

Install-ready Unsloth Studio llama.cpp bundles for b8457.

v0.1.0-beta New feature
Notable features
  • Unsloth Studio web UI launch
  • 500+ model support
  • 70% VRAM reduction vs standard training
Full changelog

Hey guys, we're super excited to launch Unsloth Studio (Beta), a new open-source web UI to train and run LLMs.

Blog + everything you need to know: https://unsloth.ai/docs/new/studio

  • Run models locally on Mac, Windows, Linux
  • Compare and battle models side-by-side
  • Train 500+ models 2x faster with 70% less VRAM
  • Supports GGUF, vision, audio, embedding models
  • Self-healing Tool calling / web search + code execution
  • Auto-create datasets from PDF, CSV, DOCX
  • Export models to GGUF, safetensor and more formats

MacOS, Linux, WSL:

For MacOS, ensure you have cmake installed. If not, run brew install cmake.

curl -fsSL https://unsloth.ai/install.sh | sh

Then to launch every time:

source unsloth_studio/bin/activate
unsloth studio -H 0.0.0.0 -p 8888

Windows:

Run in Windows Powershell:

irm https://unsloth.ai/install.ps1 | iex

Then to launch every time:

.\unsloth_studio\Scripts\activate
unsloth studio -H 0.0.0.0 -p 8888

Docker

Use our Docker image unsloth/unsloth container. Run:

docker run -d -e JUPYTER_PASSWORD="mypassword" \
  -p 8888:8888 -p 8000:8000 -p 2222:22 \
  -v $(pwd)/work:/workspace/work \
  --gpus all \
  unsloth/unsloth

https://github.com/user-attachments/assets/4f48e6ed-5ef9-42d8-8404-64a4d8b36846

What's Changed

  • Update CODEOWNERS for studio and cli by @danielhanchen in https://github.com/unslothai/unsloth/pull/4266
  • [Feature] Support Sequence Classification by @danielhanchen in https://github.com/unslothai/unsloth/pull/4264
  • [Feature] VLMs support for GRPO by @danielhanchen in https://github.com/unslothai/unsloth/pull/4265
  • [Fix] Respect llm_int8_skip_modules for VLM by @danielhanchen in https://github.com/unslothai/unsloth/pull/4249
  • ROCM support by @danielhanchen in https://github.com/unslothai/unsloth/pull/4271
  • Remove Blackwell flex attention disable workaround from studio by @danielhanchen in https://github.com/unslothai/unsloth/pull/4273
  • ROCM support by @danielhanchen in https://github.com/unslothai/unsloth/pull/4272
  • fix: prevent ai-assist model config RCE via untrusted Hugging Face repos by @danielhanchen in https://github.com/unslothai/unsloth/pull/4274
  • fix(seed): disable remote code execution in seed inspect dataset loads by @danielhanchen in https://github.com/unslothai/unsloth/pull/4275
  • Update CODEOWNERS by @danielhanchen in https://github.com/unslothai/unsloth/pull/4279
  • fix: install data-designer plugin non-editable for Colab compatibility by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/4268
  • Arch/mixtral by @danielhanchen in https://github.com/unslothai/unsloth/pull/4283
  • Improve documentation on how to export model from Colab by @danielhanchen in https://github.com/unslothai/unsloth/pull/4284
  • feat: Add Mixtral model support by @danielhanchen in https://github.com/unslothai/unsloth/pull/4285
  • Initial changes: Refactor Attention by @danielhanchen in https://github.com/unslothai/unsloth/pull/4286
  • patch vlm trainer to resize images by @danielhanchen in https://github.com/unslothai/unsloth/pull/4287
  • [WIP] add support for mixtral by @danielhanchen in https://github.com/unslothai/unsloth/pull/4288
  • studio: speed up setup -- uv for installs (8x), Ninja for llama.cpp (1.7x) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4289
  • fix: remove old comments by @Shine1i in https://github.com/unslothai/unsloth/pull/4292
  • PR: Windows Setup Improvements by @rolandtannous in https://github.com/unslothai/unsloth/pull/4299
  • miscallenous studio by @Shine1i in https://github.com/unslothai/unsloth/pull/4293
  • Fix: Compare Mode Deadlock, Cancel Event Poisoning & IPC Optimization by @rolandtannous in https://github.com/unslothai/unsloth/pull/4303
  • studio: fix GGUF inference -- reasoning tokens, max_tokens, server flags, GPU allocation by @danielhanchen in https://github.com/unslothai/unsloth/pull/4290
  • chat only with gguf for mac devices by @Manan17 in https://github.com/unslothai/unsloth/pull/4300
  • studio: add max steps and epochs toggle switch by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4296
  • Fix/colab plugin editable install by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/4281
  • Graceful shutdown on Windows (signal handlers for Ctrl+C) by @rolandtannous in https://github.com/unslothai/unsloth/pull/4306
  • studio: simplify auth UX to password-only login by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4305
  • studio: preserve save_steps when toggling to epochs mode by @Imagineer99 in https://github.com/unslothai/unsloth/pull/4308
  • Fix studio frontend build producing empty Tailwind CSS by @danielhanchen in https://github.com/unslothai/unsloth/pull/4311
  • Fix setup.sh crash on Mac with empty gitignore array by @danielhanchen in https://github.com/unslothai/unsloth/pull/4313
  • [Feature] studio: user can upload eval dataset by @Manan17 in https://github.com/unslothai/unsloth/pull/4307
  • fix: Ctrl+C not terminating backend on Linux by @rolandtannous in https://github.com/unslothai/unsloth/pull/4316
  • Add download progress bar for non-GGUF models in Chat by @danielhanchen in https://github.com/unslothai/unsloth/pull/4314
  • Apply use_reentrant removal to all TRL trainer configs by @danielhanchen in https://github.com/unslothai/unsloth/pull/4321
  • Fix VLM GRPO matmul shape mismatch in _get_per_token_logps_and_entropies by @danielhanchen in https://github.com/unslothai/unsloth/pull/4301
  • Improve AI Assist: Update default model, model output parsing, logging, and dataset mapping UX by @rolandtannous in https://github.com/unslothai/unsloth/pull/4323
  • studio: per-model inference defaults, GGUF slider fix, reasoning toggle by @danielhanchen in https://github.com/unslothai/unsloth/pull/4325
  • fix: Resolve CUDA toolkit mismatch on multi-CUDA Windows systems by @rolandtannous in https://github.com/unslothai/unsloth/pull/4324
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/4332
  • Fix/colab comment edits by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/4317
  • fix: add Qwen3.5 version gate in loader dispatch by @danielhanchen in https://github.com/unslothai/unsloth/pull/4335
  • Fix xformers Blackwell guard: broader coverage and root cause docs by @danielhanchen in https://github.com/unslothai/unsloth/pull/4338
  • studio: improve Colab notebook, redesign ready popup, and clean up install output by @LeoBorcherding in https://github.com/unslothai/unsloth/pull/4339
  • Add check to disable xformers on newer GPUs by @pluesclues in https://github.com/unslothai/unsloth/pull/4342
  • studio: training progress, CUDA lib path, dataset_num_proc fix by @danielhanchen in https://github.com/unslothai/unsloth/pull/4336
  • studio: fix stale GGUF metadata, update helper model, auth improvements by @danielhanchen in https://github.com/unslothai/unsloth/pull/4346
  • studio: show "Off" for repetition penalty = 1 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4349
  • studio: update Creative/Precise presets, show "Off" for disabled samplers by @danielhanchen in https://github.com/unslothai/unsloth/pull/4350
  • studio: fix slow cancellation of GGUF generation by @danielhanchen in https://github.com/unslothai/unsloth/pull/4352
  • Fix: Remove unused warmupToastShown variable (TS6133) by @rolandtannous in https://github.com/unslothai/unsloth/pull/4353
  • Studio: SVG preview, fix streaming and model selector bugs by @danielhanchen in https://github.com/unslothai/unsloth/pull/4354
  • fix: comment out debug print statements by @rolandtannous in https://github.com/unslothai/unsloth/pull/4357
  • fix(llm_assist): disable thinking mode for helper model JSON output by @rolandtannous in https://github.com/unslothai/unsloth/pull/4358
  • studio: improve onboarding UX, tooltips, and training defaults by @danielhanchen in https://github.com/unslothai/unsloth/pull/4355

New Contributors

  • @LeoBorcherding made their first contribution in https://github.com/unslothai/unsloth/pull/4268
  • @Shine1i made their first contribution in https://github.com/unslothai/unsloth/pull/4292
  • @Manan17 made their first contribution in https://github.com/unslothai/unsloth/pull/4300
  • @Imagineer99 made their first contribution in https://github.com/unslothai/unsloth/pull/4296

Full Changelog: https://github.com/unslothai/unsloth/commits/March-2026

February-2026 New feature
Notable features
  • 12x faster MoE training with 35% less VRAM
  • 1.8-3.3x faster embedding model training
  • 7x longer context RL training
Full changelog

Our first release of 2026! This year we’ve got a lot of exciting things coming and to kick things off, we’re introducing faster MoE training, embedding model support, and ultra long context for Reinforcement Learning. We’ll also be launching our brand new UI very soon.

We’d like to thank all of you for 50K stars on GitHub! ⭐

We’ve also added support for many new models that you can now run and fine-tune locally, including DeepSeek-OCR 2, GLM-4.7-Flash, Kimi-2.5, and more.

🚀 Faster MoE training

You can now train MoE models 12× faster with 35% less VRAM and 6x longer context via our new Triton and math kernels (no accuracy loss). gpt-oss-20b works on 12.8GB VRAM. Qwen3-30B-A3B (16-bit LoRA) uses 63GB.

Unsloth supports fast training for gpt-oss, Qwen3 (30B, 235B, VL, Coder), DeepSeek R1/V3 arch and GLM (4.7, Flash) models.

Faster MoE Blog

🔎 Embedding models now train 2× faster

We collaborated with Hugging Face to enable 1.8-3.3x faster embedding, BERT and classifier model training with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.

Embedding model Blog

💡 Ultra Long Context RL is here

We’re introducing new batching algorithms to enable ~7x longer context (can be more than 12x) RL training with no accuracy or speed degradation vs. other optimized setups that use FA3, kernels & chunked losses.

Unsloth now trains gpt-oss QLoRA with 380K context on a single 192GB NVIDIA B200 GPU

Long Context RL Blog

🔮 New models

🎉 Extra Updates

  1. As part of our MoE release, we also made Gemma-3 now use Flex-Attention by default, and this works in float16 settings as well (there were infinities which we solved a while back). Gemma-3 now uses O(N) memory and not O(N^2) memory, and trains >3x faster (scales even better with context length). Previous Unsloth versions would OOM.
  2. Vision fine-tuning now accepts mixed data of only images and text data!
  3. trl==0.27.1 and transformers==5.1.0 are supported well - previous coverage was 30% of all our 120 notebooks, but now we have >80% coverage - we plan to make it 100% over the next few days.
  4. And many many other bug fixes and other updates!

📖 New Guides

  • </> How To Use Claude Code + Codex with local LLMs: Guide
  • 👾 Train & deploy to LM Studio for local inference: Guide
  • 🎨 Run Diffusion image models with Unsloth GGUFs: Guide

[!TIP]
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

February is shaping up to be an amazing month for LLM releases, and we hope you’re just as excited as we are. 😊

What's Changed

  • [FIX] [Transformers] VLM input embeds fix for gradients by @Datta0 in https://github.com/unslothai/unsloth/pull/3715
  • [fbgemm] Silence tma fbgemm by @Datta0 in https://github.com/unslothai/unsloth/pull/3735
  • [hf_hub] Token login by @Datta0 in https://github.com/unslothai/unsloth/pull/3739
  • Do not overwrite slots by @Datta0 in https://github.com/unslothai/unsloth/pull/3752
  • Fix VLM + DDP checkpointing by @djsaunde in https://github.com/unslothai/unsloth/pull/3751
  • Enable 4-bit quantization on AMD Radeon GPUs by @sstamenk in https://github.com/unslothai/unsloth/pull/3748
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3753
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3760
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3767
  • Add missing import of inspect by @sstamenk in https://github.com/unslothai/unsloth/pull/3778
  • Clarify NotImplementedError for fast_inference with full_finetuning by @Fizza-Mukhtar in https://github.com/unslothai/unsloth/pull/3768
  • Update FUNDING.yml by @danielhanchen in https://github.com/unslothai/unsloth/pull/3792
  • fix(trainer): import psutil to prevent NameError in _prepare_dataset by @alkinun in https://github.com/unslothai/unsloth/pull/3780
  • fastrope fix for zero strided tensors by @f14-bertolotti in https://github.com/unslothai/unsloth/pull/3782
  • Fix crash when trl.experimental.openenv is unavailable by @Fizza-Mukhtar in https://github.com/unslothai/unsloth/pull/3787
  • Fix Boolean value of Tensor ambiguity error in mistral.py by @yurekami in https://github.com/unslothai/unsloth/pull/3790
  • fix: add support for init_lora_weights="corda" in get_peft_model by @majiayu000 in https://github.com/unslothai/unsloth/pull/3794
  • Fix correctness bugs in rl.py, rl_replacements.py, and vision.py by @danielhanchen in https://github.com/unslothai/unsloth/pull/3811
  • Fix correctness bugs across multiple model files by @danielhanchen in https://github.com/unslothai/unsloth/pull/3813
  • Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass by @Fizza-Mukhtar in https://github.com/unslothai/unsloth/pull/3806
  • FIX: weight tying for LoRA embeddings and lm_head by @oKatanaaa in https://github.com/unslothai/unsloth/pull/3711
  • Fix Gemma3 QAT training instability with int8-int4 scheme by @danielhanchen in https://github.com/unslothai/unsloth/pull/3818
  • Add helpful error messages for fast_generate when fast_inference=False by @danielhanchen in https://github.com/unslothai/unsloth/pull/3820
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3821
  • Make llama.cpp CURL dependency optional when building from source by @Fizza-Mukhtar in https://github.com/unslothai/unsloth/pull/3822
  • remove redundant code of has_block by @ykaitao in https://github.com/unslothai/unsloth/pull/3832
  • rl.py fixes: buffer reset, safer attribute access, typo fix by @danielhanchen in https://github.com/unslothai/unsloth/pull/3834
  • Respect user quantization_config by @danielhanchen in https://github.com/unslothai/unsloth/pull/3835
  • Fix vLLM PDL bug on Blackwell GPUs (B200/B100) by @danielhanchen in https://github.com/unslothai/unsloth/pull/3841
  • Sync chat_template from tokenizer to vLLM by @danielhanchen in https://github.com/unslothai/unsloth/pull/3842
  • remove unused variable BlockDiagonalCausalMask by @ykaitao in https://github.com/unslothai/unsloth/pull/3836
  • Replace GitHub API check with vLLM version check for PDL fix by @danielhanchen in https://github.com/unslothai/unsloth/pull/3849
  • GRPO: restore model mode after generate (stacked on #3754) by @danielhanchen in https://github.com/unslothai/unsloth/pull/3851
  • Fix model training state restoration in GRPO trainer by @numb3r33 in https://github.com/unslothai/unsloth/pull/3754
  • Unify Version usage and fix TRL version handling by @danielhanchen in https://github.com/unslothai/unsloth/pull/3843
  • [ModelScope] Disable stats when modelscope is being used by @Datta0 in https://github.com/unslothai/unsloth/pull/3857
  • Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs by @danielhanchen in https://github.com/unslothai/unsloth/pull/3863
  • Feature/raw text dataprep by @Vangmay in https://github.com/unslothai/unsloth/pull/3612
  • Fix Kaggle telemetry misclassification when COLAB_ keys exist by @hnxnq7 in https://github.com/unslothai/unsloth/pull/3869
  • reduce code duplication by _offload_frozen_module_for_training by @ykaitao in https://github.com/unslothai/unsloth/pull/3865
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3881
  • wrong number of dimensions by @f14-bertolotti in https://github.com/unslothai/unsloth/pull/3880
  • Disable gradient checkpointing when explicitly off for vision by @ducviet00 in https://github.com/unslothai/unsloth/pull/3879
  • [trl] use non lora model as base for RL by @Datta0 in https://github.com/unslothai/unsloth/pull/3895
  • Chunk Across Batch and Context length for logprob calculations for grpo by @pluesclues in https://github.com/unslothai/unsloth/pull/3628
  • add weight-only int8 QAT scheme and update tests for torchao 0.15.0 by @electroglyph in https://github.com/unslothai/unsloth/pull/3859
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3905
  • Fix vllm ipykernel patch by @pluesclues in https://github.com/unslothai/unsloth/pull/3907
  • Handle Transformers 5 vLLM import errors by @danielhanchen in https://github.com/unslothai/unsloth/pull/3908
  • add FastSentenceTransformer for easily finetuning SentenceTransformer models by @electroglyph in https://github.com/unslothai/unsloth/pull/3719
  • Guard torch.compile on ROCm when triton_key is missing by @hnxnq7 in https://github.com/unslothai/unsloth/pull/3923
  • Grpo compile settings update by @pluesclues in https://github.com/unslothai/unsloth/pull/3927
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3937
  • chore: Update outdated GitHub Actions version by @pgoslatara in https://github.com/unslothai/unsloth/pull/3936
  • [trl] vllm trl topk fixup by @Datta0 in https://github.com/unslothai/unsloth/pull/3935
  • [fix] qwen3-guard tokenizer by @Datta0 in https://github.com/unslothai/unsloth/pull/3959
  • fix for intel devices torch compile configs by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3952
  • Use standard gradient checkpointing for small sequence lengths by @danielhanchen in https://github.com/unslothai/unsloth/pull/3867
  • reduce code duplication by @ykaitao in https://github.com/unslothai/unsloth/pull/3877
  • Fix TRL 0.27.0 GRPO compatibility and PEFT model handling by @danielhanchen in https://github.com/unslothai/unsloth/pull/3969
  • Fix Vision GRPO string prompts and OpenEnv async compatibility by @danielhanchen in https://github.com/unslothai/unsloth/pull/3964
  • Fix num_train_epochs=None causing TypeError in GRPOConfig by @danielhanchen in https://github.com/unslothai/unsloth/pull/3972
  • Add TRL truncation regression and metadata loss fixes (Fixes 1 and 3) by @danielhanchen in https://github.com/unslothai/unsloth/pull/3971
  • Add vLLM + torch < 2.9.0 + SM100 compatibility check by @danielhanchen in https://github.com/unslothai/unsloth/pull/3973
  • Fix torchvision compatibility check for source builds and future torch versions by @danielhanchen in https://github.com/unslothai/unsloth/pull/3978
  • Trl 0.27.0 update by @pluesclues in https://github.com/unslothai/unsloth/pull/3965
  • Prefer flex attention when available by @danielhanchen in https://github.com/unslothai/unsloth/pull/3979
  • Fix GPT-OSS BlockMask error during inference by @danielhanchen in https://github.com/unslothai/unsloth/pull/3982
  • Silence third-party deprecation warnings and fix socket leak by @danielhanchen in https://github.com/unslothai/unsloth/pull/3983
  • Silence non-actionable TRL trainer import failures by @danielhanchen in https://github.com/unslothai/unsloth/pull/3980
  • Add PyTorch 2.10 and xformers 0.0.34 support by @danielhanchen in https://github.com/unslothai/unsloth/pull/3985
  • [MoE] Improve moe kernels for unsloth fine tuning by @Datta0 in https://github.com/unslothai/unsloth/pull/3812
  • Fix RuntimeError not caught when torchcodec fails to load by @danielhanchen in https://github.com/unslothai/unsloth/pull/3987
  • Fix cutlass inductor options for PyTorch < 2.8.0 by @danielhanchen in https://github.com/unslothai/unsloth/pull/3988
  • Disable torchcodec in transformers when FFmpeg is missing by @danielhanchen in https://github.com/unslothai/unsloth/pull/3989
  • Update rl_replacements.py to filter through correct trl version by @pluesclues in https://github.com/unslothai/unsloth/pull/3990
  • Fix multiprocessing crash on Windows/macOS and unify num_proc logic by @danielhanchen in https://github.com/unslothai/unsloth/pull/3999
  • Fix triton 3.6.0 + torch 2.9.x torch.compile crash (missing cluster_dims) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4001
  • Add push_to_hub_gguf support for FastSentenceTransformer by @Etherll in https://github.com/unslothai/unsloth/pull/4002
  • [Feature] seperate gguf file path by @RektPunk in https://github.com/unslothai/unsloth/pull/3934
  • Refactor Ollama template wiring and harden packing helpers by @mmangkad in https://github.com/unslothai/unsloth/pull/3890
  • Fix multi-GPU loading for quantized models in distributed training by @Fizza-Mukhtar in https://github.com/unslothai/unsloth/pull/3917
  • Fix broken documentation links, typos, and formatting in README by @danielhanchen in https://github.com/unslothai/unsloth/pull/4003
  • fix: inputs_embeds ignored when input_ids is not None in _fast_prepare_inputs_for_generation by @siddhudonda in https://github.com/unslothai/unsloth/pull/3814
  • Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 by @danielhanchen in https://github.com/unslothai/unsloth/pull/3998
  • Fix VLM model + text-only dataset ValueError in TRL 0.22.x by @danielhanchen in https://github.com/unslothai/unsloth/pull/4004
  • Fix trl.experimental thin wrapper compilation and OOM from peft_config overwrite by @danielhanchen in https://github.com/unslothai/unsloth/pull/4006
  • Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training by @danielhanchen in https://github.com/unslothai/unsloth/pull/4005
  • Silence TRL's batch_size=1 padding-free warning in compiled trainer source by @danielhanchen in https://github.com/unslothai/unsloth/pull/4007
  • Silence peft target_parameters RuntimeWarning for MoE models by @danielhanchen in https://github.com/unslothai/unsloth/pull/4008
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/4009
  • Suppress vLLM v1 executor sleep/wake log messages by @danielhanchen in https://github.com/unslothai/unsloth/pull/4011
  • Inject model reference for dynamic token_type_ids detection in SFTTrainer by @danielhanchen in https://github.com/unslothai/unsloth/pull/4012
  • Fix EmbeddingGemma float16 NaN via FORCE_FLOAT32 for gemma3_text by @danielhanchen in https://github.com/unslothai/unsloth/pull/4014
  • Fix #3397: Prevent trainer tokenization hang with safe num_proc by @Fizza-Mukhtar in https://github.com/unslothai/unsloth/pull/4013
  • add llama.cpp prefix to gguf conversion help messages by @rolandtannous in https://github.com/unslothai/unsloth/pull/4016
  • [Misc] Fixes by @Datta0 in https://github.com/unslothai/unsloth/pull/4015
  • FP8: Load model on-the-fly in vLLM by @andrewor14 in https://github.com/unslothai/unsloth/pull/3717
  • Fix Gemma3 4B training on transformers 5.x (token_type_ids) by @danielhanchen in https://github.com/unslothai/unsloth/pull/4017
  • Fix warmup_ratio deprecation for transformers >= 5.0 by @danielhanchen in https://github.com/unslothai/unsloth/pull/4019
  • Misc fixes by @Datta0 in https://github.com/unslothai/unsloth/pull/4018

Unsloth Zoo Changes

  • Fix training crash when using DoRA + 4-bit quantization by @Etherll in https://github.com/unslothai/unsloth-zoo/pull/394
  • fix for #392, transformers 5 by @electroglyph in https://github.com/unslothai/unsloth-zoo/pull/393
  • fix: adds missing import for torch.distributed by @namekian-mystifier in https://github.com/unslothai/unsloth-zoo/pull/422
  • Fix dtype mismatch in full finetuning + float16 inference by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/424
  • Fix undefined variable 'e' in Version() function by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/425
  • Fix correctness bugs in logging_utils.py and loss_utils.py by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/426
  • Fix execute_with_time_limit start_method bug by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/428
  • Fix OpenEnv PYTHONPATH auto-detection for compatibility by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/429
  • Fix VARIANT_KWARG_KEYS import for peft >= 0.18.0 by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/430
  • Fix ZeroDivisionError in fused cross entropy when GPU memory exhausted by @GabrielArpini in https://github.com/unslothai/unsloth-zoo/pull/432
  • Only enable gradient checkpointing when requested by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/433
  • Removing import check in compiler.py by @Vidit-Ostwal in https://github.com/unslothai/unsloth-zoo/pull/431

Unsloth Notebooks changes

  • Add Gemma phone deployment notebook by @glee2429 in https://github.com/unslothai/notebooks/pull/146
  • Use stable executorch 1.0.0 and optimum-executorch v0.1.0 by @danielhanchen in https://github.com/unslothai/notebooks/pull/151
  • Update 2048 RL notebook with training results by @danielhanchen in https://github.com/unslothai/notebooks/pull/152
  • Update 2048 RL notebook with extended training results by @danielhanchen in https://github.com/unslothai/notebooks/pull/153
  • new GRPO update notebooks by @pluesclues in https://github.com/unslothai/notebooks/pull/155
  • gemma3 1b changes by @pluesclues in https://github.com/unslothai/notebooks/pull/156
  • nemo gym multi environment notebook by @cmunley1 in https://github.com/unslothai/notebooks/pull/158
  • Add LFM2.5 notebooks by @mlabonne in https://github.com/unslothai/notebooks/pull/159
  • Revert "Add LFM2.5 notebooks" by @danielhanchen in https://github.com/unslothai/notebooks/pull/161
  • Restore UNSLOTH_VLLM_STANDBY in Kaggle Gemma3 Vision GRPO by @danielhanchen in https://github.com/unslothai/notebooks/pull/163
  • Grpo update gemma notebooks correctly and news lines for notebooks by @pluesclues in https://github.com/unslothai/notebooks/pull/157
  • Add LFM2.5 notebooks (reopen #159) by @danielhanchen in https://github.com/unslothai/notebooks/pull/164
  • GLM 4.7 Flash finetuning notebook by @Datta0 in https://github.com/unslothai/notebooks/pull/166
  • Embedding models notebooks by @Etherll in https://github.com/unslothai/notebooks/pull/160
  • add Qwen3_Embedding_0.6B notebook by @Etherll in https://github.com/unslothai/notebooks/pull/167
  • [UPDATE] Update openenv notebooks to use the latest implementation by @burtenshaw in https://github.com/unslothai/notebooks/pull/165
  • Fix Vision GRPO chat template and Orpheus column removal by @danielhanchen in https://github.com/unslothai/notebooks/pull/171
  • update nemo gym notebooks by @cmunley1 in https://github.com/unslothai/notebooks/pull/169
  • Fix Vision GRPO notebooks and Orpheus TTS compatibility by @danielhanchen in https://github.com/unslothai/notebooks/pull/172
  • Add AMD known issues note by @hnxnq7 in https://github.com/unslothai/notebooks/pull/168
  • Update Dockerfile_DGX_Spark by @XEL-Maker in https://github.com/unslothai/notebooks/pull/162
  • Revert PR #165 - OpenEnv notebooks by @danielhanchen in https://github.com/unslothai/notebooks/pull/179
  • Fix update_all_notebooks.py script improvements by @danielhanchen in https://github.com/unslothai/notebooks/pull/176
  • Makign qwen 2.5 7b compatible with old trl versions. by @pluesclues in https://github.com/unslothai/notebooks/pull/177
  • Fix Ministral VL installation cells by @danielhanchen in https://github.com/unslothai/notebooks/pull/181
  • Improve update_all_notebooks.py: format preservation, cross-platform fixes, parallelization by @danielhanchen in https://github.com/unslothai/notebooks/pull/183
  • Refactor update_all_notebooks.py: reorder sections, CRLF handling, README categories by @danielhanchen in https://github.com/unslothai/notebooks/pull/184
  • Separate OCR into its own README section by @danielhanchen in https://github.com/unslothai/notebooks/pull/185
  • [MoE] notebooks for Colab by @Datta0 in https://github.com/unslothai/notebooks/pull/187

New Contributors

  • @sstamenk made their first contribution in https://github.com/unslothai/unsloth/pull/3748
  • @Fizza-Mukhtar made their first contribution in https://github.com/unslothai/unsloth/pull/3768
  • @alkinun made their first contribution in https://github.com/unslothai/unsloth/pull/3780
  • @f14-bertolotti made their first contribution in https://github.com/unslothai/unsloth/pull/3782
  • @yurekami made their first contribution in https://github.com/unslothai/unsloth/pull/3790
  • @majiayu000 made their first contribution in https://github.com/unslothai/unsloth/pull/3794
  • @ykaitao made their first contribution in https://github.com/unslothai/unsloth/pull/3832
  • @numb3r33 made their first contribution in https://github.com/unslothai/unsloth/pull/3754
  • @Vangmay made their first contribution in https://github.com/unslothai/unsloth/pull/3612
  • @hnxnq7 made their first contribution in https://github.com/unslothai/unsloth/pull/3869
  • @ducviet00 made their first contribution in https://github.com/unslothai/unsloth/pull/3879
  • @electroglyph made their first contribution in https://github.com/unslothai/unsloth/pull/3859
  • @pgoslatara made their first contribution in https://github.com/unslothai/unsloth/pull/3936
  • @RektPunk made their first contribution in https://github.com/unslothai/unsloth/pull/3934
  • @mmangkad made their first contribution in https://github.com/unslothai/unsloth/pull/3890
  • @siddhudonda made their first contribution in https://github.com/unslothai/unsloth/pull/3814

Full Changelog: https://github.com/unslothai/unsloth/compare/December-2025...February-2026

December-2025 New feature
Notable features
  • 3x faster training with padding-free packing
  • 500K context length support
  • PyTorch phone deployment guide
Full changelog

Thanks for all the love and support this year! We're wishing you all a lovely Christmas. Please update Unsloth & our Docker to use the latest updates! 🦥

  • Introducing 3x faster training & 30% less VRAM. New Triton kernels, padding-free & packing. Blog
  • 500K Context training and reinforcement learning is now possible on a single 80GB GPU. BlogNotebook
  • Fine-tune then Deploy LLMs on your Phone with PyTorch and Unsloth. TweetRead Guide
  • 🤗 Transformers v5 is now supported! It's not enabled by default due to possible instability issues.
  • Preliminary multi-GPU support: DDP Guide (not representative of the official release early next year)
  • More: Sudoku RL nbPaddle-OCR nbNew NVIDIA blog
  • Lots of bug fixes! See further below.

:crystal_ball: New Models + Guides

[!TIP]
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

Bug Fixes and Enhancements

  1. Supports rollout_func allowing multi turn RL to work
  2. Supports vllm>=0.12.0 and efficient GRPO for it
  3. Supports transformers>=5.0.0, first shown via our Ministral notebooks
  4. Fix HuggingFace token logins not working for private repos
  5. Fixes TorchAO and QAT not working during saving
  6. Fixed DeepSeek OCR finetuning not loading finetuned models
  7. Improved vision utilities for vision VLM finetuning

What's Changed

  • Fix llama tokenizer padding_side when using model.generate in inference mode by @dmsuehir in https://github.com/unslothai/unsloth/pull/3644
  • Fix indefinite article usage in comments and docstrings by @mk0walsk in https://github.com/unslothai/unsloth/pull/3648
  • fix rope_theta -> rope_parameters['rope_theta'] by @mmathew23 in https://github.com/unslothai/unsloth/pull/3651
  • Fix broken link for advanced pip installation in README by @gitpullpull in https://github.com/unslothai/unsloth/pull/3652
  • Fix: prevent load_in_fp8 kwarg from reaching Qwen3MoeForCausalLM constructor (Fix #3649) by @bhuvanprakash in https://github.com/unslothai/unsloth/pull/3654
  • make unsloth_tiled_mlp a from_pretrained arg by @mmathew23 in https://github.com/unslothai/unsloth/pull/3655
  • FIX set defualt [128, 128] insted of none by @ved1beta in https://github.com/unslothai/unsloth/pull/3658
  • Fix: Pass gradient_checkpointing parameter to model.for_training() by @sbhavani in https://github.com/unslothai/unsloth/pull/3659
  • [FIX] Vllm guided decoding params by @Datta0 in https://github.com/unslothai/unsloth/pull/3662
  • Vllm guided decoding by @Datta0 in https://github.com/unslothai/unsloth/pull/3663
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3664
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3666
  • Update transformers version constraint in pyproject.toml by @noah1510 in https://github.com/unslothai/unsloth/pull/3689
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3694
  • Remove reload_weights rpc call from grpo trainer by @Datta0 in https://github.com/unslothai/unsloth/pull/3673
  • [Fix] [TRL] load_lora for multi line llm.chat/generate by @Datta0 in https://github.com/unslothai/unsloth/pull/3696
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3698
  • SFT sample packing by @djsaunde in https://github.com/unslothai/unsloth/pull/3566
  • Auto-enable padding-free SFT by @djsaunde in https://github.com/unslothai/unsloth/pull/3672
  • [FIX] fbgemm version check by @Datta0 in https://github.com/unslothai/unsloth/pull/3704
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3706
  • update TRL filter by @djsaunde in https://github.com/unslothai/unsloth/pull/3707
  • [intel] skip xpu fbgemm fp8 by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3625
  • Mistral packing, train on completions only, simplifications by @djsaunde in https://github.com/unslothai/unsloth/pull/3709
  • Update torchao save by @metascroy in https://github.com/unslothai/unsloth/pull/3679
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3720
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3731
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3734
  • Update FUNDING.yml by @danielhanchen in https://github.com/unslothai/unsloth/pull/3736
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3737
  • Fix Deepseek OCR Lora Model Load by @mmathew23 in https://github.com/unslothai/unsloth/pull/3738

Unsloth Zoo Changes

  • updates for vLLM compativility with lora by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/359
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/355
  • Add logging to tiled mlp and fix target chunk size calculation by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/361
  • Remove include_buffers from init_empty_weights by @pluesclues in https://github.com/unslothai/unsloth-zoo/pull/363
  • packed seq lengths token count correction by @djsaunde in https://github.com/unslothai/unsloth-zoo/pull/348
  • Configure ce target gb by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/365
  • [FIX] vLLM LoRA extra vocab by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/367
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/368
  • [FIX] vLLM local lora tensor loading by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/370
  • vllm lora_dir rename and make embedding padding optional by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/373
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/375
  • Update e to error by @ChetanKrishna07 in https://github.com/unslothai/unsloth-zoo/pull/374
  • Vision utils decode image improvement by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/372
  • [FIX] [DDP] Fix compile for distributed training by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/379
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/382
  • update compiler for XLMRobertaModel by @electroglyph in https://github.com/unslothai/unsloth-zoo/pull/383
  • Fix Deepseek OCR Lora Model Load by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/386
  • fix for non-generation models in transformers 5 by @electroglyph in https://github.com/unslothai/unsloth-zoo/pull/388

New Contributors

  • @dmsuehir made their first contribution in https://github.com/unslothai/unsloth/pull/3644
  • @gitpullpull made their first contribution in https://github.com/unslothai/unsloth/pull/3652
  • @bhuvanprakash made their first contribution in https://github.com/unslothai/unsloth/pull/3654
  • @ved1beta made their first contribution in https://github.com/unslothai/unsloth/pull/3658
  • @sbhavani made their first contribution in https://github.com/unslothai/unsloth/pull/3659
  • @noah1510 made their first contribution in https://github.com/unslothai/unsloth/pull/3689
  • @ChetanKrishna07 made their first contribution in https://github.com/unslothai/unsloth-zoo/pull/374
  • @electroglyph made their first contribution in https://github.com/unslothai/unsloth-zoo/pull/383

Full Changelog: https://github.com/unslothai/unsloth/compare/November-2025...December-2025

November-2025 New feature
Notable features
  • FP8 RL training with 1.4x speedup
  • DeepSeek-OCR fine-tuning support
  • Qwen3-VL model support
Full changelog

We’re getting close to our final release of 2025! Thanks so much for sticking with us this year. We’ve got lots of new features so please update Unsloth & our Docker to use the latest updates! 🦥

  • Introducing FP8 Reinforcement Learning in Unsloth! Train on any FP8 supported GPU and get 1.4x faster with 60% less VRAM: Read our Blog/Guide • Notebooks: Qwen3-8B FP8 GRPO and Llama-3.2-1B FP8 GRPO

  • You may notice Unsloth now uses much less VRAM than before, enabling even longer context. We’re also implementing faster training very soon and we’ll share all the details in an upcoming blog.

  • DeepSeek-OCR fine-tuning is here! We fine-tuned DeepSeek-OCR, improving its language understanding by 89%. Read our BlogFree notebook

  • Qwen3-VL models supported including GGUFs to run locally: Blogpost + fixesGGUFs

  • We analyzed RL training-inference mismatch for FP16 vs. BF16 and concluded that Unsloth does not have this issue: Analysis and Results

  • We’ve partnered with Docker to let you run LLMs locally with zero setup. Docker GGUFs are now powered by Unsloth Dynamic.
    Example: docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16 Read guide

  • Baidu ERNIE models are now supported. Notebooks coming soon.

  • Unsloth now supports SGLang. Read our guide

  • We wrote guides for LoRA Hot Swapping and vLLM Engine Arguments

  • Run Kimi-K2-Thinking the most powerful open model locally. Kimi-K2 Guide

  • Lots of bug fixes! See further below.

[!TIP]
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

Bug Fixes and Enhancements

  1. Supports trl>=0.25.0 and vllm>=0.11.2 and transformers>=4.57.1
  2. Fixed gpt-oss GRPO, RL excessive re-compilations on torch>=2.9.0
  3. Fixes Sleep mode and reduces memory usage by 5 to 15% further for RL, GRPO
  4. Fix propagation of trust_remote_code = True
  5. Fix Unsloth offloaded gradient checkpointing not offloading on 1st step - reduces VRAM by >20%
  6. Add logits.detach() to GRPO to solve double backwards on some pathways
  7. Add int64 kernels & fixed RoPE embeddings to allow super ultra long context training
  8. Fixed 📓 OpenEnv gpt-oss RL notebook
  9. DGX Spark docker image fixed

What's Changed

  • Grpo gradient accumulation edits by @pluesclues in https://github.com/unslothai/unsloth/pull/3390
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3532
  • Handle TRL version compatibility in rl_replacements.py by @pluesclues in https://github.com/unslothai/unsloth/pull/3540
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3546
  • Sleep trl patch by @Datta0 in https://github.com/unslothai/unsloth/pull/3517
  • Detach logits before returning from function by @pluesclues in https://github.com/unslothai/unsloth/pull/3554
  • Fix typos in comment by @mk0walsk in https://github.com/unslothai/unsloth/pull/3557
  • Formatting & bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3563
  • DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in https://github.com/unslothai/unsloth/pull/3564
  • pre-commit CI config by @djsaunde in https://github.com/unslothai/unsloth/pull/3565
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3576
  • Resize rope embeddings for long sequence training by @mmathew23 in https://github.com/unslothai/unsloth/pull/3586
  • Patch in tiled mlp by @mmathew23 in https://github.com/unslothai/unsloth/pull/3584
  • Support for out-of-source quantizers by @Giuseppe5 in https://github.com/unslothai/unsloth/pull/3534
  • Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in https://github.com/unslothai/unsloth/pull/3578
  • Extend TorchAOConfig to support mobile usecases by @metascroy in https://github.com/unslothai/unsloth/pull/3587
  • fix qwen3 vl gradient accumulation by @mmathew23 in https://github.com/unslothai/unsloth/pull/3598
  • Do not force set beta to 0 for DAPO by @Datta0 in https://github.com/unslothai/unsloth/pull/3604
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3606
  • Fix broken links and typo in README by @mk0walsk in https://github.com/unslothai/unsloth/pull/3611
  • remove pre-commit workflow (covered by pre-commit app) by @djsaunde in https://github.com/unslothai/unsloth/pull/3618
  • Add an int64 path for mlp kernels by @mmathew23 in https://github.com/unslothai/unsloth/pull/3614
  • Remove grpo requirement bs=num_generations by @mmathew23 in https://github.com/unslothai/unsloth/pull/3609
  • Enable FP8 + RL training for bf16 models by @andrewor14 in https://github.com/unslothai/unsloth/pull/3440
  • Fix/save torchao model loading logic by @rolandtannous in https://github.com/unslothai/unsloth/pull/3621
  • Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in https://github.com/unslothai/unsloth/pull/3623
  • Add 128x128 PerBlock FP8 + RL by @andrewor14 in https://github.com/unslothai/unsloth/pull/3629
  • Add trust_remote_code parameter to tokenizer by @Etherll in https://github.com/unslothai/unsloth/pull/3631
  • [intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3168

Unsloth Zoo Changes

  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/327
  • Fix GRPO by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/328
  • fix gpt oss memory calculation for intel device by @leizhenyuan in https://github.com/unslothai/unsloth-zoo/pull/330
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/331
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/332
  • fixed unbound local error tokenizer-model from cache by @rolandtannous in https://github.com/unslothai/unsloth-zoo/pull/333
  • Now it works on a uv venv by @kittawere in https://github.com/unslothai/unsloth-zoo/pull/336
  • Gemma3n fix by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/338
  • [Intel] remove triton windows for intel by @leizhenyuan in https://github.com/unslothai/unsloth-zoo/pull/243
  • FP8 training enhancements by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/337
  • GRPO gradient accumulation steps update and DAPO support by @pluesclues in https://github.com/unslothai/unsloth-zoo/pull/308
  • Fix/video collate by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/342
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/344
  • FP8, Standby and vLLM updates by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/340
  • Put importance sampling into no grad by @pluesclues in https://github.com/unslothai/unsloth-zoo/pull/343
  • Detach hidden states to avoid gradient carry by @pluesclues in https://github.com/unslothai/unsloth-zoo/pull/345
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth-zoo/pull/347
  • MoE: Cast routing_weights dtype correctly by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/349
  • return local model in determine_base_model_source with any quantization by @noah1510 in https://github.com/unslothai/unsloth-zoo/pull/334
  • Enable FP8 + RL training by @andrewor14 in https://github.com/unslothai/unsloth-zoo/pull/351
  • Tiled MLP Implementation by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/350
  • Fix gradient checkpointing layer caller kwargs by @mmathew23 in https://github.com/unslothai/unsloth-zoo/pull/353
  • vLLM weight scale FP8 and standby override by @Datta0 in https://github.com/unslothai/unsloth-zoo/pull/354
  • Fix docstring removing regex to support empty parentheses by @noisycat3 in https://github.com/unslothai/unsloth-zoo/pull/360

Unsloth Notebooks Changes

  • Feat/qwen3 vl by @Erland366 in https://github.com/unslothai/notebooks/pull/119
  • Feat/double footer fix by @Erland366 in https://github.com/unslothai/notebooks/pull/121
  • Add GGUF section for Qwen3-VL by @Etherll in https://github.com/unslothai/notebooks/pull/123
  • Fix TypeError in unsloth_push_to_hub_gguf() when pushing GGUF model to Hugging Face by @samanta-sc in https://github.com/unslothai/notebooks/pull/125
  • fix TorchAOConfig' object has no attribute 'base_config' error by @rolandtannous in https://github.com/unslothai/notebooks/pull/129
  • Updated Dockerfile for DGX Spark by @sameersegal in https://github.com/unslothai/notebooks/pull/133
  • gemma3-270m: reduce batch size for sample packing by @djsaunde in https://github.com/unslothai/notebooks/pull/135
  • fix dataset formatting and mapping for Magistral reasoning by @rolandtannous in https://github.com/unslothai/notebooks/pull/136
  • fix magistral inference by @rolandtannous in https://github.com/unslothai/notebooks/pull/138

Full Changelog: https://github.com/unslothai/unsloth/compare/October-2025...November-2025

What's Changed

  • Grpo gradient accumulation edits by @pluesclues in https://github.com/unslothai/unsloth/pull/3390
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3532
  • Handle TRL version compatibility in rl_replacements.py by @pluesclues in https://github.com/unslothai/unsloth/pull/3540
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3546
  • Sleep trl patch by @Datta0 in https://github.com/unslothai/unsloth/pull/3517
  • Detach logits before returning from function by @pluesclues in https://github.com/unslothai/unsloth/pull/3554
  • Fix typos in comment by @mk0walsk in https://github.com/unslothai/unsloth/pull/3557
  • Formatting & bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3563
  • DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in https://github.com/unslothai/unsloth/pull/3564
  • pre-commit CI config by @djsaunde in https://github.com/unslothai/unsloth/pull/3565
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3576
  • Resize rope embeddings for long sequence training by @mmathew23 in https://github.com/unslothai/unsloth/pull/3586
  • Patch in tiled mlp by @mmathew23 in https://github.com/unslothai/unsloth/pull/3584
  • Support for out-of-source quantizers by @Giuseppe5 in https://github.com/unslothai/unsloth/pull/3534
  • Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in https://github.com/unslothai/unsloth/pull/3578
  • Extend TorchAOConfig to support mobile usecases by @metascroy in https://github.com/unslothai/unsloth/pull/3587
  • fix qwen3 vl gradient accumulation by @mmathew23 in https://github.com/unslothai/unsloth/pull/3598
  • Do not force set beta to 0 for DAPO by @Datta0 in https://github.com/unslothai/unsloth/pull/3604
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3606
  • Fix broken links and typo in README by @mk0walsk in https://github.com/unslothai/unsloth/pull/3611
  • remove pre-commit workflow (covered by pre-commit app) by @djsaunde in https://github.com/unslothai/unsloth/pull/3618
  • Add an int64 path for mlp kernels by @mmathew23 in https://github.com/unslothai/unsloth/pull/3614
  • Remove grpo requirement bs=num_generations by @mmathew23 in https://github.com/unslothai/unsloth/pull/3609
  • Enable FP8 + RL training for bf16 models by @andrewor14 in https://github.com/unslothai/unsloth/pull/3440
  • Fix/save torchao model loading logic by @rolandtannous in https://github.com/unslothai/unsloth/pull/3621
  • Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in https://github.com/unslothai/unsloth/pull/3623
  • Add 128x128 PerBlock FP8 + RL by @andrewor14 in https://github.com/unslothai/unsloth/pull/3629
  • Add trust_remote_code parameter to tokenizer by @Etherll in https://github.com/unslothai/unsloth/pull/3631
  • [intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3168
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/unslothai/unsloth/pull/3634
  • Float8 GRPO, RL by @danielhanchen in https://github.com/unslothai/unsloth/pull/3640

New Contributors

  • @mk0walsk made their first contribution in https://github.com/unslothai/unsloth/pull/3557
  • @pre-commit-ci[bot] made their first contribution in https://github.com/unslothai/unsloth/pull/3576
  • @Giuseppe5 made their first contribution in https://github.com/unslothai/unsloth/pull/3534
  • @jarrycyx made their first contribution in https://github.com/unslothai/unsloth/pull/3578
  • @MercuryYen made their first contribution in https://github.com/unslothai/unsloth/pull/3623

Full Changelog: https://github.com/unslothai/unsloth/compare/October-2025...November-2025

October-2025 New feature
Notable features
  • Unsloth Docker image on Docker Hub
  • Quantization-Aware Training (QAT) with 70% accuracy recovery
  • Qwen3-VL and Granite-4.0 support
Full changelog

Hey everyone, please update Unsloth to use the latest updates! 🦥

New model updates

New features

  • Introducing Quantization-Aware Training: We collabed with Pytorch for QAT, recovering as much 70% accuracy. Read blog
  • Unsloth supports OpenEnv to allow for open RL environments. Blog coming soon • Notebook
  • New customer support agent notebook to enable real-time analysis & solving of customer interactions. You'll also learn how to train models using data from Google Sheets.
  • Support for Python 3.13, PyTorch 2.9 and the latest Hugging Face TRL and transformers are now fixed.
  • Save to TorchAO supported as well:
from torchao.quantization import Int4WeightOnlyConfig
model.save_pretrained_torchao("model", tokenizer, torchao_config = Int4WeightOnlyConfig())

[!TIP]
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

RL Improvements

  1. Fixed Standby consuming more VRAM than usual. Auto selects the maximum 80% to 95% of GPU utilization if import os; os.environ["UNSLOTH_VLLM_STANDBY"] = "1" is used.
  2. Fixed GRPO training hangs with better environment timers - works on DGX Spark and all other GPUs.
  3. Fixes GRPO RuntimeError: shape '[1, 887, 1, 128]' is invalid for input of size 3633152 for all models

RL Environment functions

  1. New execute_with_time_limit function to force functions to execute within a time limit. E.g. with a 2 second time limit, use:
from unsloth import execute_with_time_limit
@execute_with_time_limit(2)
def execute_strategy(strategy, game):
    return _execute_strategy(strategy, game)
try:
    execute_strategy(strategy, game)
except TimeoutError as e:
    print(f"Timed out with error = {str(e)}")
  1. To check if only Python standard modules are used in a function, use check_python_modules.
  2. Use create_locked_down_function to create a function without leakage of global variables.
  3. Use Benchmarker ie from unsloth import Benchmarker to benchmark functions accurately. It wipes the L1 to L3 cache approximately to reduce chances of benchmark cheating.
  4. Use launch_openenv to launch a continuous reloaded OpenEnv environment process (to stop it from closing down) ie from unsloth import launch_openenv It will auto find a port that is not used.

Bug fixes

  1. GPT-OSS BF16 The GPTOSSRouter works with load_in_4bit = True AttributeError: 'GptOssTopKRouter' object has no attribute 'weight'
  2. Mistral training fixed - sentencepiece proto issue fixed (any protobuf version works)
  3. Fix evaluation ie UNSLOTH_RETURN_LOGITS="1" works. Fixes https://github.com/unslothai/unsloth/issues/3126 https://github.com/unslothai/unsloth/issues/3071
  4. Fixes Output 0 of UnslothFusedLossBackward is a view and is being modified inplace. for Gemma 3 and transformers>=4.57.1
  5. If you see ImportError: cannot import name '_Ink' from 'PIL._typing' (/usr/local/lib/python3.12/dist-packages/PIL/_typing.py) please update and use our new notebooks

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

  • Fix loading as 8bit by @Etherll in https://github.com/unslothai/unsloth/pull/3384
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3392
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3394
  • Update int8-int4 QAT config to use Int8DynamicActivationIntxWeightConfig by @metascroy in https://github.com/unslothai/unsloth/pull/3391
  • Gemma 3 bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3410
  • Transformers Fix v4.57 rename from PretrainedConfig to PreTrainedConfig by @mmathew23 in https://github.com/unslothai/unsloth/pull/3445
  • improve qat by @Etherll in https://github.com/unslothai/unsloth/pull/3446
  • Fix eval metric issue by @pluesclues in https://github.com/unslothai/unsloth/pull/3420
  • [Part2] Reinstate llama.cpp Compatibility and GGUF Conversion with Multiple Quantizations and Automated Ollama Modelfile Creation by @rolandtannous in https://github.com/unslothai/unsloth/pull/3356
  • vLLM FP8 quantized support for SFT/GRPO by @Datta0 in https://github.com/unslothai/unsloth/pull/3414
  • Fix by @danielhanchen in https://github.com/unslothai/unsloth/pull/3466
  • AMD fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3467
  • Fix transformers 4.57.1 by @danielhanchen in https://github.com/unslothai/unsloth/pull/3473
  • GRPO bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3474
  • EOL LF (unix line endings) normalization by @djsaunde in https://github.com/unslothai/unsloth/pull/3478
  • Fix out of resources issue for llama3.2 sft on amd gpu by @wangxunx in https://github.com/unslothai/unsloth/pull/3455
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3483
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3484
  • Patch sleep mode properly for trl by @Datta0 in https://github.com/unslothai/unsloth/pull/3492
  • Sleep trl patch by @Datta0 in https://github.com/unslothai/unsloth/pull/3494
  • fix cross entropy loss issue for small vocab size on amd gpu by @wangxunx in https://github.com/unslothai/unsloth/pull/3503
  • Gemma 3n fix by @mmathew23 in https://github.com/unslothai/unsloth/pull/3499
  • enable intel for torch2.8 by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3381
  • add code for intel qlora by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3370
  • fix for intel memory calculation by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3513
  • [intel] enable support 2.9 for intel xpu by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3514
  • FP8 training enhancements by @Datta0 in https://github.com/unslothai/unsloth/pull/3496

New Contributors

  • @metascroy made their first contribution in https://github.com/unslothai/unsloth/pull/3391
  • @djsaunde made their first contribution in https://github.com/unslothai/unsloth/pull/3478
  • @wangxunx made their first contribution in https://github.com/unslothai/unsloth/pull/3455

Full Changelog: https://github.com/unslothai/unsloth/compare/September-2025-v3...October-2025

September-2025-v3 Breaking risk
Notable features
  • gpt-oss RL with 3x faster inference
  • Custom matrix multiplication kernels
  • Reward-hacking mitigation
Full changelog

We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning

  • Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
  • Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
  • gpt-oss-20b GSPO free Colab notebook
  • This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.
  • We previously released Vision RL with GSPO support
  • ⚠️ Reminder to NOT use Flash Attention 3 for gpt-oss as it'll make your training loss wrong.
  • DeepSeek-V3.1-Terminus is here and you can run locally via our GGUF
    Read how our 3-bit GGUF beats Claude-4-Opus (thinking) on Aider Polyglot here
  • Magistral 1.2 is here and you can run it locally here or fine-tune it for free by using our Kaggle notebook
  • Fine-tuning the new Qwen3 models including Qwen3-VL, Qwen3-Omni and Qwen3-Next should work in Unsloth if you install the latest transformers. The models are big however so ensure you have enough VRAM.
  • BERT is now fixed! Feel free to use our BERT fine-tuning notebook

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3329
  • Fix QAT + LoRA fast path, add tests by @andrewor14 in https://github.com/unslothai/unsloth/pull/3307
  • Use gemma3n embedder patch + adjust FORCE_FLOAT32 match logic by @mmathew23 in https://github.com/unslothai/unsloth/pull/3332
  • Synthetic Data updates by @mmathew23 in https://github.com/unslothai/unsloth/pull/3333
  • Fix loading issues for BERT by @Etherll in https://github.com/unslothai/unsloth/pull/3339
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3335
  • peft_config before model_config by @mmathew23 in https://github.com/unslothai/unsloth/pull/3342
  • specify different tokenizer_path/name by @mmathew23 in https://github.com/unslothai/unsloth/pull/3343
  • correct python support statement by @laz-001 in https://github.com/unslothai/unsloth/pull/3374
  • GPT OSS RL by @danielhanchen in https://github.com/unslothai/unsloth/pull/3362

New Contributors

  • @laz-001 made their first contribution in https://github.com/unslothai/unsloth/pull/3374

Full Changelog: https://github.com/unslothai/unsloth/compare/September-2025-v2...September-2025-v3

September-2025-v2 New feature
Notable features
  • Vision/multimodal RL support
  • GSPO algorithm implementation
  • Unsloth Standby for RL memory efficiency
Full changelog

We're excited to support Vision models for RL and even more memory efficient + faster RL!

Unsloth now supports vision/multimodal RL with Gemma 3, Qwen2.5-VL and other vision models. Due to Unsloth's unique weight sharing and custom kernels, Unsloth makes VLM RL 1.5–2× faster, uses 90% less VRAM, and enables 10× longer context lengths than FA2 setups, with no accuracy loss. Qwen2.5-VL GSPO notebook
Gemma 3 (4B) Vision GSPO notebook

Full details in our blogpost: https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl

  • This update also introduces Qwen's GSPO algorithm.

  • Our new vision RL support also comes now even faster & more memory efficient! Our new kernels & algos allows faster RL for text and vision LLMs with 50% less VRAM & 10× more context.

  • Introducing a new RL feature called 'Standby'. Before, RL requires GPU splitting between training & inference. With Unsloth Standby, you no longer have to & 'Unsloth Standby' uniquely limits speed degradation compared to other implementations and sometimes makes training even faster! Read our Blog

  • We released Aider Polyglot benchmarks for our DeepSeek-V3.1 Dynamic GGUFs and Unsloth quants perform consistently better than others. Blog

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

  • GPT OSS Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3231
  • tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 by @rolandtannous in https://github.com/unslothai/unsloth/pull/3223
  • Update mistral.py, showed flag to not call cut cross entropy by @pluesclues in https://github.com/unslothai/unsloth/pull/3233
  • Remove old version constraint in dependency list by @timkpaine in https://github.com/unslothai/unsloth/pull/3237
  • chore: Fix Typos by @DefiWimar7 in https://github.com/unslothai/unsloth/pull/3246
  • Fix incorrect function call in test_qwen3_grpo.py by @stevenxdavis in https://github.com/unslothai/unsloth/pull/3212
  • [Intel] make intel device support ROPE by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3164
  • Support saving locally in model.save_pretrained_torchao by @jerryzh168 in https://github.com/unslothai/unsloth/pull/3263
  • fixed save_pretrained_torchao and associated tests by @rolandtannous in https://github.com/unslothai/unsloth/pull/3264
  • patch sftrainer to disable _is_vlm by @mmathew23 in https://github.com/unslothai/unsloth/pull/3265
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3266
  • Filter vllm executor log by @Datta0 in https://github.com/unslothai/unsloth/pull/3268
  • llama vision inference fix by @mmathew23 in https://github.com/unslothai/unsloth/pull/3270
  • Add TorchAO quantization tests with FP16 models and serialization workarounds by @rolandtannous in https://github.com/unslothai/unsloth/pull/3269
  • GptAttention turn training off during inference by @mmathew23 in https://github.com/unslothai/unsloth/pull/3289
  • Add support for QAT full fine-tuning by @andrewor14 in https://github.com/unslothai/unsloth/pull/3238
  • simplify unsloth_base_fast_generate by @mmathew23 in https://github.com/unslothai/unsloth/pull/3291
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3295
  • [ROCm] add hip device path by @billishyahao in https://github.com/unslothai/unsloth/pull/3301
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3322
  • Add support for modules_to_save in FastModel.get_peft_model by @l1ghtsource in https://github.com/unslothai/unsloth/pull/3317
  • Fast Inference with vLLM for VLMs by @Datta0 in https://github.com/unslothai/unsloth/pull/2975
  • TRL Updated version of VLM GRPO update along with GSPO by @pluesclues in https://github.com/unslothai/unsloth/pull/3132

New Contributors

  • @timkpaine made their first contribution in https://github.com/unslothai/unsloth/pull/3237
  • @stevenxdavis made their first contribution in https://github.com/unslothai/unsloth/pull/3212
  • @l1ghtsource made their first contribution in https://github.com/unslothai/unsloth/pull/3317

Full Changelog: https://github.com/unslothai/unsloth/compare/August-2025-v2...September-2025-v2

August-2025-v2 New feature
Notable features
  • Flex Attention for 8x longer context
  • Export QLoRA to llama.cpp/vLLM/HF
  • MXFP4 inference swiglu fix
Full changelog

We’re excited to introduce Unsloth Flex Attention support for OpenAI gpt-oss training that enables >8× longer context lengths, >50% less VRAM usage and >1.5× faster training compared to all implementations including those using Flash Attention 3 (FA3). Unsloth Flex Attention makes it possible to train with a 60K context length on just 80GB of VRAM for BF16 LoRA. Also:

  • You can now export/save your QLoRA fine-tuned gpt-oss model to llama.cpp, vLLM, or HF.
  • We fixed gpt-oss training losses going to infinity on float16 GPUs (like T4 Colab)
  • We fixed gpt-oss implementation issues, most notably ensuring that swiglu_limit = 7.0 is properly applied during MXFP4 inference in transformers
  • Unsloth Flex Attention scales with context, longer sequences yield bigger savings in both VRAM and training time

Full details in our blogpost: https://docs.unsloth.ai/basics/long-context-gpt-oss-training

What's Changed

  • Add Qwen3 Instruct / Thinking chat templates by @Etherll in https://github.com/unslothai/unsloth/pull/3110
  • Add Qwen3 4B to mapper.py by @Etherll in https://github.com/unslothai/unsloth/pull/3120
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3148
  • Fix GPT OSS by @danielhanchen in https://github.com/unslothai/unsloth/pull/3154
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3169
  • Update Blackwell install instructions for latest vLLM release by @qingy1337 in https://github.com/unslothai/unsloth/pull/3175
  • Fix potential generator exhaustion bug in model loading file detection by @rolandtannous in https://github.com/unslothai/unsloth/pull/3167
  • Fix vision model GGUF quantization_method error type by @rolandtannous in https://github.com/unslothai/unsloth/pull/3173
  • Replace back ticks with single quotes by @rnowling in https://github.com/unslothai/unsloth/pull/3157
  • Fix original_push_to_hub fallback by @Thiraput01 in https://github.com/unslothai/unsloth/pull/3115
  • Add support for QAT + LoRA by @andrewor14 in https://github.com/unslothai/unsloth/pull/2976
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3180
  • Torch 2.8 by @danielhanchen in https://github.com/unslothai/unsloth/pull/3186
  • Fix extras transformers typo in pyproject.toml by @parth2510 in https://github.com/unslothai/unsloth/pull/3187
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3195
  • allow torch.float32 dtype in FastLanguageModel by @mmathew23 in https://github.com/unslothai/unsloth/pull/3204
  • fix is casual for qwen3 by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3213
  • Support model.save_pretrained_torchao by @jerryzh168 in https://github.com/unslothai/unsloth/pull/3111
  • Fix gemma-3n by @mmathew23 in https://github.com/unslothai/unsloth/pull/3219
  • Handle transformers move to dtype from torch_dtype by @mmathew23 in https://github.com/unslothai/unsloth/pull/3225
  • chore: Fix Typos by @DefiWimar7 in https://github.com/unslothai/unsloth/pull/3224

New Contributors

  • @rnowling made their first contribution in https://github.com/unslothai/unsloth/pull/3157
  • @Thiraput01 made their first contribution in https://github.com/unslothai/unsloth/pull/3115
  • @andrewor14 made their first contribution in https://github.com/unslothai/unsloth/pull/2976
  • @parth2510 made their first contribution in https://github.com/unslothai/unsloth/pull/3187
  • @jerryzh168 made their first contribution in https://github.com/unslothai/unsloth/pull/3111
  • @DefiWimar7 made their first contribution in https://github.com/unslothai/unsloth/pull/3224

Full Changelog: https://github.com/unslothai/unsloth/compare/August-2025...August-2025-v2

August-2025 New feature
Notable features
  • gpt-oss training on 14GB VRAM (Colab compatible)
  • 1.5x faster training, 50% less VRAM
  • Blackwell RTX 50 support
Full changelog

gpt-oss is here! ✨

Finetune gpt-oss for free with our Unsloth Colab notebook!

  • We’ve managed to make gpt-oss train on just 14GB of VRAM, making it possible to work on free Colab due to our linear conversions. For more details, Read our Guide/Blogpost
  • Fine-tuning gpt-oss is 1.5x faster and uses 50% less VRAM with Unsloth. gpt-oss-120b model fits on 65GB of VRAM.
  • Model uploads: 20b GGUF120b GGUFAll uploads

:sloth: Unsloth updates

  • We’ve made algorithmic updates to Unsloth so every model now trains faster and with less VRAM, no matter which.
  • Unsloth now works on RTX 50 and Blackwell GPUs. Read our guide.
  • Official Unsloth Docker image coming very soon!
  • You can now run Unsloth models directly via Docker: docker model pull hf.co/unsloth/gpt-oss-20b-GGUF

:stars: Qwen3-Coder + Qwen3-2507

Qwen made July, 2025 updates called 'Qwen3-2507' and launched their SOTA coding models!

:crystal_ball: New models + Support:

Run these new models:

Unsloth also now supports running + training for:

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

  • Fix argument mismatch in GRPO _get_per_token_logps lambda function by @rolandtannous in https://github.com/unslothai/unsloth/pull/2929
  • patch falcon h1 inference by @mmathew23 in https://github.com/unslothai/unsloth/pull/2932
  • Fix falcon H1 dropout issue by @Datta0 in https://github.com/unslothai/unsloth/pull/2938
  • fix: change lora_dropout from int to float for type consistency by @muzzlol in https://github.com/unslothai/unsloth/pull/2949
  • GRPO fix dataloader_num_workers value error in GRPOTrainer by @rolandtannous in https://github.com/unslothai/unsloth/pull/2944
  • GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel by @rolandtannous in https://github.com/unslothai/unsloth/pull/2943
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2982
  • Update unsloth-cli.py by @qgallouedec in https://github.com/unslothai/unsloth/pull/2985
  • use fastmodel falcon h1 by @mmathew23 in https://github.com/unslothai/unsloth/pull/2987
  • Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error by @rolandtannous in https://github.com/unslothai/unsloth/pull/2986
  • Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error" by @danielhanchen in https://github.com/unslothai/unsloth/pull/2988
  • Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized … by @danielhanchen in https://github.com/unslothai/unsloth/pull/2990
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2998
  • Update README.md by @qgallouedec in https://github.com/unslothai/unsloth/pull/2991
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/3017
  • [bugs] fix for casual mask by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3011
  • [intel] add for intel path for llama.py by @leizhenyuan in https://github.com/unslothai/unsloth/pull/3012
  • Fix Gemma 2 by @danielhanchen in https://github.com/unslothai/unsloth/pull/3024
  • falcon h1 force float32 when dtype is torch.float16 by @mmathew23 in https://github.com/unslothai/unsloth/pull/3026
  • Fix torch compile issues by @danielhanchen in https://github.com/unslothai/unsloth/pull/3028
  • Fix Llama and Gemma inference by @Erland366 in https://github.com/unslothai/unsloth/pull/3034
  • Fixup multi GPU workload. by @Datta0 in https://github.com/unslothai/unsloth/pull/3049
  • Bug Fixes and Enhancements for Model Loading by @Etherll in https://github.com/unslothai/unsloth/pull/3052
  • Add gemma-3n chat template to chat_templates.py by @Etherll in https://github.com/unslothai/unsloth/pull/3051
  • Fix: Added specific check for Gemma so models like BERT properly init… by @Sekinal in https://github.com/unslothai/unsloth/pull/3055
  • fixup rope sync for everything by @Datta0 in https://github.com/unslothai/unsloth/pull/3061
  • get_per_token_logps_and_entropies: return tuple instead of dict by @mmathew23 in https://github.com/unslothai/unsloth/pull/3080
  • Docs: Add WSL Installation Guide for Blackwell / RTX 5090 GPU by @dongbin-lunark in https://github.com/unslothai/unsloth/pull/3079
  • GPT-OSS support by @mmathew23 in https://github.com/unslothai/unsloth/pull/3099
  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/3102
  • gpt-oss manually call temporary patch by @mmathew23 in https://github.com/unslothai/unsloth/pull/3104

New Contributors

  • @muzzlol made their first contribution in https://github.com/unslothai/unsloth/pull/2949
  • @Sekinal made their first contribution in https://github.com/unslothai/unsloth/pull/3055
  • @dongbin-lunark made their first contribution in https://github.com/unslothai/unsloth/pull/3079

Full Changelog: https://github.com/unslothai/unsloth/compare/July-2025...August-2025

July-2025 Bug fix
Notable features
  • 10-25% VRAM reduction across all models
  • GRPO works with latest TRL main
  • Qwen 2.5 and GLM fixes
Full changelog

More VRAM reduction, faster & bug fixes

Please update Unsloth! pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

  1. Gemma 3N Vision now works and is fixed! Please re-download all model checkpoints (Unsloth will auto do it) Try Kaggle Notebook! There is also a challenge with a prize pool of $100,000!
  2. Gemma 3 text and vision are all fixed for T4, and is much faster. Losses of 6 to 7 are now fixed - it should be 1 to 2.
  3. 10 to 25% less VRAM consumption for all models. Also faster compiling and less errors. Unsloth is now more stable!
  4. Downloads stuck at 90% to 95% fixed!
  5. Qwen 2.5, Qwen 2, GLM all fixed as well.
  6. GRPO now works with latest main TRL
  7. Main TRL, PEFT, Transformers all work
  8. Forced upgrading transformers is now fixed.
  9. Falcon H1 finetuning should work great! Notebooks incoming
  10. Devstral 1.1 and MedGemma 27B, 4B support with vision
  11. Many many many more bug fixes - this release of Unsloth should be much more stable and error tolerant!

Please update Unsloth! pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

What's Changed

  • Gemma 3N by @danielhanchen in https://github.com/unslothai/unsloth/pull/2809
  • Add instructions for installing unsloth on RTX 5090 by @jeromeku in https://github.com/unslothai/unsloth/pull/2812
  • Add falcon h1 by @dhiaEddineRhaiem in https://github.com/unslothai/unsloth/pull/2650
  • Granite4 support by @mmathew23 in https://github.com/unslothai/unsloth/pull/2799
  • import undefined transformers_version for falcon model by @mmathew23 in https://github.com/unslothai/unsloth/pull/2822
  • Fix LoftQ with FastBaseModel by @mehmetoguzderin in https://github.com/unslothai/unsloth/pull/2826
  • Create stale.yml by @danielhanchen in https://github.com/unslothai/unsloth/pull/2832
  • Create stale.yml by @danielhanchen in https://github.com/unslothai/unsloth/pull/2836
  • Added conda/mamba section to blackwell installation readme by @rolandtannous in https://github.com/unslothai/unsloth/pull/2817
  • Gemma 3N bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2842
  • Fix loftq None config for FastBaseModel by @mmathew23 in https://github.com/unslothai/unsloth/pull/2848
  • Convert torch.bfloat16, torch.float16, etc. to vLLM valid dtypes by @rishabh135 in https://github.com/unslothai/unsloth/pull/2811
  • [Feature] enable unsloth on amd gpu by @billishyahao in https://github.com/unslothai/unsloth/pull/2520
  • Fix Gemma 3N by @danielhanchen in https://github.com/unslothai/unsloth/pull/2854
  • fix quantized model parameter count method by @rolandtannous in https://github.com/unslothai/unsloth/pull/2855
  • Update CSM for faster inference (no compile) by @mmathew23 in https://github.com/unslothai/unsloth/pull/2865
  • Fix UnslothTrainingArguments not patching trl.Config properly by @Erland366 in https://github.com/unslothai/unsloth/pull/2873
  • Fix unnecessary warning for transformers >= 4.53.0 by @mmathew23 in https://github.com/unslothai/unsloth/pull/2867
  • Update README.md by @danielhanchen in https://github.com/unslothai/unsloth/pull/2885
  • Many bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2908
  • silenty skip falcon h1 import if transformers_version < 4.53.0 by @mmathew23 in https://github.com/unslothai/unsloth/pull/2912
  • Dynamically adjust get_per_token_logps [trl main upgrade] by @Datta0 in https://github.com/unslothai/unsloth/pull/2911
  • [Intel] add intel gpu with vllm support by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2903
  • [bugs] fix for casual mask by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2868
  • Explicitly check if xformers exists for attention by @Datta0 in https://github.com/unslothai/unsloth/pull/2889
  • Falcon H1: if mlp doesn't exist in layer module check for feed_forward by @mmathew23 in https://github.com/unslothai/unsloth/pull/2913
  • Move inputs to right devices. by @Datta0 in https://github.com/unslothai/unsloth/pull/2919
  • Many bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2927

New Contributors

  • @dhiaEddineRhaiem made their first contribution in https://github.com/unslothai/unsloth/pull/2650
  • @mehmetoguzderin made their first contribution in https://github.com/unslothai/unsloth/pull/2826
  • @rishabh135 made their first contribution in https://github.com/unslothai/unsloth/pull/2811
  • @billishyahao made their first contribution in https://github.com/unslothai/unsloth/pull/2520

Full Changelog: https://github.com/unslothai/unsloth/compare/June-2025...July-2025

June-2025 New feature
Notable features
  • Gemma 3n text-image-video-audio support
  • TTS model fine-tuning (Orpheus, Whisper)
  • DeepSeek-R1 GRPO support
Full changelog

✨ Gemma 3n now available

  • Google's new Gemma 3n multimodal models that support text, image, video & audio. Guide
  • Gemma 3n finetuning notebook + audio, vision, text inference Colab notebook
  • Gemma 3n collection in dynamic GGUF, safetensor 4-bit etc formats: Gemma-3n

🎵 Text-to-Speech (TTS) Fine-tuning

  • Train TTS/STT models like Sesame-CSM, Orpheus-TTS and OpenAI's Whisper locally! Guide
  • Clone voices, learn new emotions, tones & styles with 1.5x faster training and -50% VRAM. Notebooks

[!TIP]
Update Unsloth via pip install --upgrade --force-reinstall unsloth unsloth_zoo

🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs

  • Fine-tune DeepSeek-R1-0528-Qwen3 with GRPO! Our new reward function increases multilingual response rates by 40%+ Notebook
  • Dynamic 1-bit GGUFs shrink the full 715GB model to just 175GB (-80% size)

📈 Dynamic 2.0 GGUFs

  • New quantization method that achieves SOTA performance. More info
  • Sets new benchmarks for 5-shot MMLU and KL Divergence and selectively quantizes layers for optimal accuracy

⚡ Advanced Qwen3 GRPO notebook

  • Proximity scoring for more better reward functions. Advanced GRPO notebook
  • New Prefinetuning/priming to skip GRPO format learning

🎯 Magistral Conversational Reasoning

  • Fine-tune Magistral-24B for advanced conversational reasoning. Notebook

👁️ Gemma3 Vision Support

  • Fine-tune Gemma3 vision models for multimodal tasks Notebook

Documentation & Guides

  • Reinforcement Learning Guide: Complete guide on RL for LLMs covering GRPO, RLHF, DPO. Guide
  • LoRA Hyperparameters Guide: Master optimal learning rates, epochs, LoRA rank & alpha settings. Guide

What's Changed

  • Nightly by @danielhanchen in https://github.com/unslothai/unsloth/pull/2448
  • Added k_norm & q_norm to merged Qwen3 layers by @cblomert in https://github.com/unslothai/unsloth/pull/2452
  • MoE Kernel by @jeromeku in https://github.com/unslothai/unsloth/pull/2465
  • Blackwell Support by @johnnynunez in https://github.com/unslothai/unsloth/pull/2458
  • Added missing code of conduct by @rolandtannous in https://github.com/unslothai/unsloth/pull/2416
  • Fix readme example by @yuanzhedong in https://github.com/unslothai/unsloth/pull/2492
  • the pixtral vision notebook fails during inference by @mmathew23 in https://github.com/unslothai/unsloth/pull/2466
  • [1/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2350
  • [2/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2388
  • vLLM Windows CUDA support [tested] by @fenglui in https://github.com/unslothai/unsloth/pull/2158
  • Add Sesame CSM by @mmathew23 in https://github.com/unslothai/unsloth/pull/2527
  • Add Qwen-3 chat template and Ollama template support by @kiankyars in https://github.com/unslothai/unsloth/pull/2537
  • Fix typos by @omahs in https://github.com/unslothai/unsloth/pull/2540
  • Add use_rslora reference to LoraConfig inititalisation by @jkumz in https://github.com/unslothai/unsloth/pull/2539
  • TTS by @danielhanchen in https://github.com/unslothai/unsloth/pull/2545
  • Quick fix on the CompileConfig error by @Erland366 in https://github.com/unslothai/unsloth/pull/2554
  • Fix trust remote code by @Etherll in https://github.com/unslothai/unsloth/pull/2357
  • fix issue with qwen3 template double quote escapes by @davedgd in https://github.com/unslothai/unsloth/pull/2563
  • Display the model name in RoPE scaling unsupported error by @emmanuel-ferdman in https://github.com/unslothai/unsloth/pull/2564
  • Fix Whisper, ModernBERT by @danielhanchen in https://github.com/unslothai/unsloth/pull/2565
  • fix: improved error handling when llama.cpp build fails #2358 by @Hansehart in https://github.com/unslothai/unsloth/pull/2603
  • Remove dataset_text_field from SFTConfig by @qgallouedec in https://github.com/unslothai/unsloth/pull/2609
  • Upgrade trl fix by @Datta0 in https://github.com/unslothai/unsloth/pull/2544
  • Check the skip_prepare_dataset before accessing dataset fields. #2496 by @Premik in https://github.com/unslothai/unsloth/pull/2633
  • Llama4 MoE Grouped GEMM by @jeromeku in https://github.com/unslothai/unsloth/pull/2639
  • Latest TRL, GRPO + Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2645
  • Fix SFTtraining for new trl by @mmathew23 in https://github.com/unslothai/unsloth/pull/2647
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2651
  • Fix quant model param fetch regex by @Datta0 in https://github.com/unslothai/unsloth/pull/2662
  • Fix batched generation for prompts of different lengths by @RunFMe in https://github.com/unslothai/unsloth/pull/2216
  • reroute merge logic language models + comprehensive tests + eval kits by @rolandtannous in https://github.com/unslothai/unsloth/pull/2673
  • unsloth checkpointing fix for latest transformers==4.52.x by @mmathew23 in https://github.com/unslothai/unsloth/pull/2674
  • patch sft_trainer to favor max_seq_length over max_length in config by @mmathew23 in https://github.com/unslothai/unsloth/pull/2669
  • Update prepare 4d causal attention call by @mmathew23 in https://github.com/unslothai/unsloth/pull/2678
  • Ignore None Values when building vllm subprocess_command by @Salpingopharyngeus in https://github.com/unslothai/unsloth/pull/2680
  • add support for torch270 with Intel GPU by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2709
  • Making protobuf version more flexible by @user799595 in https://github.com/unslothai/unsloth/pull/2637
  • tests for additional merge fix unsloth zoo pr 163 by @rolandtannous in https://github.com/unslothai/unsloth/pull/2719
  • Reward modeling update (There seems to be another patch) by @pluesclues in https://github.com/unslothai/unsloth/pull/2710
  • Fix Typos in Documentation and Comments by @leopardracer in https://github.com/unslothai/unsloth/pull/2721
  • Fix renaming on other model than Llama by @Erland366 in https://github.com/unslothai/unsloth/pull/2762
  • Enable vLLM to share memory space by @Datta0 in https://github.com/unslothai/unsloth/pull/2712
  • Fix TRL 1.8.2 by @marcandrelarochelle in https://github.com/unslothai/unsloth/pull/2774
  • Fix AttributeError in GRPO trainer for models without llm attribute by @rolandtannous in https://github.com/unslothai/unsloth/pull/2780
  • Additional tests for unsloth-zoo PR#174 by @rolandtannous in https://github.com/unslothai/unsloth/pull/2779
  • Update pyproject.toml by @amrothemich in https://github.com/unslothai/unsloth/pull/2778
  • Fix for grpo_compute_loss_slow by @simpissa in https://github.com/unslothai/unsloth/pull/2702
  • Fix GRPO by @danielhanchen in https://github.com/unslothai/unsloth/pull/2787
  • Docs: Fix typo and improve MoE docstrings by @kilavvy in https://github.com/unslothai/unsloth/pull/2784
  • [5/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2768
  • Sequence Classification Bug Fixes by @pluesclues in https://github.com/unslothai/unsloth/pull/2793
  • intel 5/N fix patch by @mmathew23 in https://github.com/unslothai/unsloth/pull/2792
  • [3/N] Enable intel GPU for unsloth by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2620
  • [4/N] Enable intel GPU for unsloth by @mmathew23 in https://github.com/unslothai/unsloth/pull/2801
  • [intel] use DeviceProperties instead of torch.xxx.deviceproperties by @leizhenyuan in https://github.com/unslothai/unsloth/pull/2803
  • Fix grpo sleep regex and indentation by @Datta0 in https://github.com/unslothai/unsloth/pull/2804
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2805
  • Bug fixes by @danielhanchen in https://github.com/unslothai/unsloth/pull/2807

New Contributors

  • @cblomert made their first contribution in https://github.com/unslothai/unsloth/pull/2452
  • @johnnynunez made their first contribution in https://github.com/unslothai/unsloth/pull/2458
  • @rolandtannous made their first contribution in https://github.com/unslothai/unsloth/pull/2416
  • @yuanzhedong made their first contribution in https://github.com/unslothai/unsloth/pull/2492
  • @mmathew23 made their first contribution in https://github.com/unslothai/unsloth/pull/2466
  • @leizhenyuan made their first contribution in https://github.com/unslothai/unsloth/pull/2350
  • @fenglui made their first contribution in https://github.com/unslothai/unsloth/pull/2158
  • @kiankyars made their first contribution in https://github.com/unslothai/unsloth/pull/2537
  • @omahs made their first contribution in https://github.com/unslothai/unsloth/pull/2540
  • @jkumz made their first contribution in https://github.com/unslothai/unsloth/pull/2539
  • @davedgd made their first contribution in https://github.com/unslothai/unsloth/pull/2563
  • @emmanuel-ferdman made their first contribution in https://github.com/unslothai/unsloth/pull/2564
  • @qgallouedec made their first contribution in https://github.com/unslothai/unsloth/pull/2609
  • @Premik made their first contribution in https://github.com/unslothai/unsloth/pull/2633
  • @RunFMe made their first contribution in https://github.com/unslothai/unsloth/pull/2216
  • @Salpingopharyngeus made their first contribution in https://github.com/unslothai/unsloth/pull/2680
  • @user799595 made their first contribution in https://github.com/unslothai/unsloth/pull/2637
  • @pluesclues made their first contribution in https://github.com/unslothai/unsloth/pull/2710
  • @leopardracer made their first contribution in https://github.com/unslothai/unsloth/pull/2721
  • @marcandrelarochelle made their first contribution in https://github.com/unslothai/unsloth/pull/2774
  • @amrothemich made their first contribution in https://github.com/unslothai/unsloth/pull/2778
  • @simpissa made their first contribution in https://github.com/unslothai/unsloth/pull/2702
  • @kilavvy made their first contribution in https://github.com/unslothai/unsloth/pull/2784

Full Changelog: https://github.com/unslothai/unsloth/compare/May-2025...June-2025

Beta — feedback welcome: [email protected]