Release history
ludwig releases
Low-code framework for building custom LLMs, neural networks, and other AI models
All releases
19 shown
Dependency bump + AutoML change
Fixed RuntimeError when auto‑tuning batch size or learning rate with Ray by initializing the distributed strategy.
Full changelog
Bug Fixes & Test Improvements
Bug Fixes
fix: call init_dist_strategy("local") in tune_batch_size_fn and tune_learning_rate_fn— fixesRuntimeError: Distributed strategy not initialized(#4149) when auto-tuning batch size or learning rate via Ray with non-MeanMetricoutput features (e.g. number output →MSEMetric).fix: remove redundant dtype check in text_feature— removes a strict integer-only dtype guard that broke tests passing float32 tensors (the function already casts to int32 internally).fix: add visualize __main__.py for python -m ludwig.visualize— restorespython -m ludwig.visualizeafter the visualize module was split into a package.fix: update test_serve_v2 to use numpy_to_python— updates test import after_numpy_safewas renamed tonumpy_to_pythonindata_utils.
Refactoring
refactor: major api.py cleanup— guard clauses, extraction, type annotations, docstring fixes acrossapi.py,serve.py,serve_v2.py,visualize/, and several utils.
Tests
test: regression tests for transformers import safety and Ray tune dist strategy— prevents recurrence of #4142 (brokenPreTrainedModelimport) and #4149 (missinginit_dist_strategyin Ray tune fns).fix: rewrite torch_utils tests to not fail in no-CUDA CI environments— CUDA-specific GPU isolation tests now skip gracefully on CPU-only runners; idempotency test rewritten to verify behavior directly.
Fixed Python 3.12 compatibility when torchao and PyTorch are version‑mismatched.
Full changelog
Bug Fixes
-
from ludwig.api import LudwigModelfails on Python 3.12 — Whentorchaoand PyTorch are version-mismatched (torchaocallstorch.utils._pytree.register_constant, added in PyTorch 2.5+),transformers's lazy loader raisesModuleNotFoundErrorfor any class defined inmodeling_utils.py, includingPreTrainedModel.llm_utils.pyandtext_feature.pyboth imported these classes at module level; they are now deferred toTYPE_CHECKING. Follows the same fix applied tohf_utils.pyin v0.15.1. (#4142) -
Replace
assertwith explicit exceptions; fix mutable default arg — Internalassertstatements replaced withValueError/RuntimeErrorso they aren't silently stripped withpython -O. Fixed a mutable default argument that could cause cross-call state leakage. (#4152) -
Unified ruff toolchain — Replaced
black+isort+flake8with a singleruffinvocation for linting and formatting. No behavior change for users.
- PatchTST & N-BEATS encoders for multivariate/univariate forecasting via TimeseriesOutputFeature
- MASE and sMAPE evaluation metrics for forecasting tasks
- Advanced PEFT adapters: TinyLoRA, C3A, OFT, HRA, WaveFT, LN-Tuning, VBLoRA
Full changelog
New Features
Timeseries Forecasting
- PatchTST & N-BEATS encoders — State-of-the-art patch-based and basis-expansion timeseries encoders. Both support multivariate and univariate forecasting with
TimeseriesOutputFeature. (#4147) - MASE & sMAPE metrics — Mean Absolute Scaled Error and symmetric Mean Absolute Percentage Error added for forecasting evaluation. (#4147)
Advanced PEFT Adapters
- New adapter types — TinyLoRA, C3A, OFT (Orthogonal Fine-Tuning), HRA (Householder Reflection Adaptation), WaveFT, LN-Tuning, VBLoRA. (#4146)
- New LoRA initializers — PiSSA (Principal Singular values and Singular vectors Adaptation), EVA (Explained Variance Adaptation), CorDA/LoftQ. (#4146)
Phase 6: Future Capabilities
- LLM config generation —
ludwig generate_config "describe your task"uses an LLM to write the YAML config for you. (#4092) - HyperNetwork combiner — Conditioning-based feature fusion where one feature generates weights for others. (#4092)
- Nash-MTL & Pareto-MTL — Game-theoretic and preference-based multi-task loss balancing strategies. (#4092)
New Examples
- VLM fine-tuning — LLaVA, Qwen2-VL, InternVL via
is_multimodal: true. (#4140) - Mamba-2 / Jamba encoders — State-space model encoders for sequence tasks. (#4140)
- Ray Serve & KServe deployment — Distributed and Kubernetes-native serving shims. (#4140)
- Multi-task & HyperNetwork examples (#4112)
Bug Fixes
- Python 3.12 import fix — Deferred
PreTrainedModelimport toTYPE_CHECKINGto fixImportErroron Python 3.12 when HuggingFace transformers is not installed in all code paths. - Dask image bytes UnicodeDecodeError —
dask.config.set({"dataframe.convert-string": False})is now applied at import time, preventingUnicodeDecodeErrorwhen image bytes pass through Dask string columns. (#4151) - Dask shuffle partd race condition — Replaced file-based Dask shuffle (which hit a partd lock race under concurrent workers) with tasks-based shuffle. (#4150)
- Encoder
input_shapecontract — Fixed a contract violation where certain encoders did not correctly report or handleinput_shape, causing shape mismatches during model construction. (#4148) - Ray backend GPU underutilization —
RayDatasetBatchernow runsto_tensorslocally in the producer thread instead of spawning remote Ray tasks per block. Datasets are materialized before training to avoid Parquet re-reads on every epoch. (#4144)
Fixed Ray training slowdown and GPU underutilization bugs plus added Python 3.14 compatibility.
Full changelog
Bug fixes
-
Ray training 3.7x slowdown eliminated —
LudwigProgressBarwas callingrt.report()on every training batch when running inside Ray workers (~1.9 s/call through the Ray GCS). With hundreds of batches this completely dominated wall-clock time. Per-batch progress reporting is now suppressed; training metrics continue to be reported correctly at eval/checkpoint time. Ray overhead is now ~1.7x vs local (fixed TorchTrainer setup cost), down from 3.7x. (#4144) -
GPU underutilization in Ray backend fixed —
RayDatasetBatcherwas runningto_tensorsviamap_batches(spawning a Ray remote task per dataset block with scheduling overhead). It now runs locally in the producer thread. Also: datasets are now materialized before training to avoid re-reading Parquet from disk on every epoch. (#4144) -
Python 3.14 compatibility —
LudwigBaseConfigsubclasses crashed withPydanticUserError: Field requires a type annotationon Python 3.14 because annotations are now stored lazily via__annotate_func__. Fixed in_LudwigModelMeta.__new__. (#4144) -
ModernBERT tokenizer — Models containing "bert" in their name (e.g.
answerdotai/ModernBERT-base) were incorrectly routed toBertTokenizer(WordPiece), causingMissing [UNK] tokenerrors. ModernBERT now correctly usesHFTokenizer(AutoTokenizer / BPE). (#4144) -
Dask
meta=parameter — Multiple feature types (binary,category,sequence,timeseries,text,vector) called.map()without ameta=argument, causingValueError: Metadata inference failed in mapwhen using the Dask engine (backend: {type: ray, processor: {type: dask}}). All bare.map()calls are now fixed. (#4144)
- Minimum Python version bumped to 3.12; PyTorch now requires ≥2.7, Transformers ≥5.x, Ray ≥2.54, Pydantic upgraded to v2
- Deprecation warning: tuple unpacking of `model.train()` return value is deprecated
- Removed `Horovod`, `Neuropod`, and `LightGBM` backends
- Renamed hyperopt parameter prefix from `training.` to `trainer.` (e.g., `training.learning_rate` → `trainer.learning_rate`)
- Removed `ftrl` optimizer; use `adagrad` or `sgd` with momentum
- GRPO Alignment: reward-model‑free RLHF via `trainer.type: grpo`
- torchao Quantization + QAT with int4, int8, float8 modes and fake‑quant observers (`qat: true`)
- Multi-Adapter PEFT supporting multiple named LoRA adapters and weighted merge strategies
Full changelog
Ludwig 0.15.0
New Features
GRPO Alignment
Reward-model-free RLHF using Group Relative Policy Optimization. Set trainer.type: grpo to train LLMs directly from a reward signal without a separate reward model.
torchao Quantization + QAT
PyTorch-native quantization backend alongside bitsandbytes. Supports int4_weight_only, int8_weight_only, int8_dynamic, and float8 modes. Set qat: true to insert fake-quantization observers before training, recovering 1–2 perplexity points versus post-training quantization.
Multi-Adapter PEFT
Train and serve multiple named LoRA adapters on the same base model using adapters: (plural). Supports all PEFT weighted merge strategies: linear, SVD, TIES, DARE-linear, DARE-TIES, and magnitude pruning.
Native Optuna Hyperopt Executor
executor.type: optuna runs hyperparameter optimization without requiring Ray Tune. Supports Auto, GP, TPE, CMA-ES, and random samplers; median, Hyperband, SHA, and NOP pruners; and SQLite/PostgreSQL storage for resumable runs.
Timeseries Forecasting
First-class TimeseriesOutputFeature with a projector decoder that directly predicts all horizon steps in one forward pass. model.forecast(dataset, horizon=N) generates iterative multi-step predictions using O(window_size + horizon) incremental preprocessing.
Muon and ScheduleFreeAdamW Optimizers
Two new optimizers: muon for large-scale pretraining and schedule_free_adamw for fine-tuning without a learning rate schedule.
Image Segmentation Decoders
unet, segformer, and fpn decoders for semantic segmentation tasks on image output features.
Dependency Upgrades
| Package | Previous | 0.15.0 |
|---------|----------|--------|
| Python | 3.11 | 3.12 |
| PyTorch | 2.5 | 2.7+ |
| Transformers | 4.x | 5.x |
| Ray | 2.x | 2.54 |
| Pydantic | 1.x | 2.x |
Breaking Changes
Horovod,Neuropod, andLightGBMbackends removedtraining.hyperopt parameter prefix renamed totrainer.(e.g.training.learning_rate→trainer.learning_rate)ftrloptimizer removed; useadagradorsgdwith momentummodel.train()returns aTrainingResultsdataclass; tuple unpacking still works but is deprecated
Full Changelog
https://github.com/ludwig-ai/ludwig/compare/v0.14.1...v0.15.0
Fixed row-ordering failures in Ray integration tests and updated README to reflect actual export formats.
Full changelog
Patch release following v0.14.0.
Test suite
Fix the three (later four) row-ordering failures in `tests/integration_tests/test_ray.py::check_preprocessed_df_equal` that surfaced on `main` right after v0.14.0. Ray's compute path materializes the preprocessed dataframe through `Ray Dataset -> dataset.to_dask()`, which does not preserve the source row index — so the ray-produced and local-produced dataframes could list the same rows in different orders. The row-wise equality check then flagged visually-identical category / vector columns as unequal.
`check_preprocessed_df_equal` now sorts both sides by a per-row hash computed over every deterministic-content column, using `ndarray.tobytes()` for vector / sequence / timeseries / etc. cells so that tests with only a single scalar column (notably `test_ray_vector`) still produce a unique ordering. Binary / image / audio columns are excluded from the sort key because NaN-fill strategies can legitimately differ across backends; those columns were already compared with order-independent shape-only checks.
Fixes:
- `test_ray_tabular[dask]`
- `test_ray_tabular_save_inputs[parquet]`
- `test_ray_vector[parquet]`
- `test_ray_vector[csv]`
Documentation
README: drop the stale `ludwig export_torchscript` and Triton references (neither exists in the 0.14 codebase) and document the real 0.14 export surface — SafeTensors (default), `torch.export`, and ONNX via the dynamo-based exporter — with the actual `ludwig export_model` CLI flags (`--model_path`, `--output_path`, `--format`).
PyPI upload is handled automatically by `.github/workflows/upload-pypi.yml` on release publish.
- Minimum Python 3.12, PyTorch 2.7+, Pydantic 2, Transformers 5, NumPy 2 required.
- Ray upgraded to version 2.54 using modern ray.data.Dataset pipeline; Dask updated to 2026.1.2; MLflow to 3.10.
- Removed deprecated `transformer_xl`, `ctrl`, and `flaubert` text encoders.
- Removed the `ftrl` optimizer from Trainer.
- Removed Horovod, Neuropod backends, and GBM/LightGBM backend.
- Added modernbert Text encoder with Flash Attention 2, RoPE, 8192‑token context.
- New transformer_generator Sequence decoder with teacher_forcing_decay, beam search options.
- Introduced focal_loss, dice_loss, lovasz_softmax_loss, nt_xent_loss, poly_loss and other new loss functions.
Full changelog
Major feature release. See the full commit log and the updated
documentation for details.
Encoders
- Text:
modernbert(Flash Attention 2, RoPE, 8192-token context). Removed deprecatedtransformer_xl,ctrl, andflaubertencoders. - Image:
clip,dinov2,siglip,convnextv2. - Audio:
wav2vec2,whisper,hubert. - Sequence:
mamba(linear-time state space model); RoPE support for the stacked Transformer encoder. - Category:
target(mean-target encoding with cross-fitting) andhash(feature hashing) for high-cardinality features. - Number:
bins(learned discretization).
Decoders
- Sequence: new
transformer_generatordecoder,teacher_forcing_decayscheduled sampling, andbeam_width/beam_length_penaltybeam search shared across generator variants. - Category: new
mlp_classifierdecoder;calibration: temperature_scaling(Guo et al., ICML 2017);mc_dropout_samplesfor Monte Carlo dropout uncertainty (Gal & Ghahramani, ICML 2016). - Image segmentation: configurable U-Net depth (
num_stages), newsegformerandfpndecoders. - LLM:
category_extractornow supportsregexandjson_schemamatch strategies plusconstrain_to_vocabularyconstrained decoding.
Losses
focal_loss,dice_loss,lovasz_softmax_loss,nt_xent_loss,poly_loss.entmax_1.5_lossregistered for category / text / sequence features.- Open-set recognition:
entropic_open_setandobjectosphere(Dhamija et al., NeurIPS 2018). - Anomaly detection:
deep_svdd,deep_sad,drocc.
Trainer
- New optimizers:
radam,adafactor,schedule_free_adamw,muon,soap. - New LR schedulers:
one_cycle,inverse_sqrt,polynomial,wsd(warmup-stable-decay). - Preference-based LLM training:
dpo,kto,orpo,grpo. - Loss balancing:
uncertainty,famo,gradnorm,log_transform. - Quality presets (
medium_quality/high_quality/best_quality), model soup, modality dropout. - Removed: the
ftrloptimizer.
Serving & export
- Auto-generated Pydantic request/response schemas from model config.
- Prometheus
/metricsendpoint, structured logging, and request timeouts (HTTP 504). - OpenAI-compatible vLLM server (PagedAttention, continuous batching) via
ludwig.serve_vllm.run_vllm_server. - Model export: SafeTensors (default),
torch.export(.pt2), and the dynamo-based ONNX exporter (torch.onnx.export(dynamo=True)).
Infrastructure
- Python 3.12, PyTorch 2.7+, Pydantic 2, Transformers 5, NumPy 2.
- Ray 2.54 on the modern
ray.data.Datasetpipeline, Dask 2026.1.2, MLflow 3.10. - uv-based test runner and modernized GitHub Actions Docker workflows.
- Fix for distributed metric aggregation across Ray eval workers.
- Removed: Horovod, Neuropod, and the GBM / LightGBM backend.
PyPI upload is handled automatically by .github/workflows/upload-pypi.yml on release publish.
- Removed DDP, FSDP, and DeepSpeed distributed strategy classes; use `strategy: accelerate`.
- KTO (Kahneman-Tversky Optimization) for human-feedback alignment without paired preference data
- ORPO (Odds Ratio Preference Optimization) for reference‑free preference learning
- GRPO (Group Relative Policy Optimization) DeepSeek‑style RL with group‑relative rewards
Full changelog
Ludwig 0.13.0
Alignment Training: KTO, ORPO, GRPO
- KTO (Kahneman-Tversky Optimization): human-feedback alignment without paired preference data
- ORPO (Odds Ratio Preference Optimization): reference-free preference learning
- GRPO (Group Relative Policy Optimization): DeepSeek-style RL with group-relative rewards
- Full DPO trainer with sequence packing support
Expanded PEFT & Quantization
- LoRA+, DoRA, LoftQ, VeRA, FourierFT adapters
- PEFT now available for ECD encoders (not just LLMs)
- torchao integration: int4/int8/float8 quantization
- trust_remote_code support fixed for custom-code models (#4094, #4095)
Encoder Modernization
- RoPE positional embeddings, ConvNeXtV2, TfIdf n-gram encoders
- Accelerate strategy replaces DDP/FSDP/DeepSpeed — simpler distributed training with full Accelerate ecosystem support
- DDP, FSDP, and DeepSpeed distributed strategy classes removed (use
strategy: accelerate)
Model Export & Serving
- vLLM serving backend for optimized LLM inference
- MLflow 3.x integration
- Auto-generated model card and training report after every training run
- Ray Job Submission example for remote cluster training
Dependency Cleanup
- FT-Transformer is now the default combiner for 3+ features (stronger baseline)
- Ray Docker images updated to 2.54.0
- Removed internal DictWrapper class from LLM model
- Streamlined dependency groups in pyproject.toml
- Removed all marshmallow backward-compatibility layers (`@ludwig_dataclass`, `_SchemaAdapter`, `.Schema().load/dump()`)
- Renamed `BaseMarshmallowConfig` → `LudwigBaseConfig`
- Renamed `DictMarshmallowField` → `NestedConfigField`
- Four new combiners: FTTransformerCombiner, CrossAttentionCombiner, PerceiverCombiner, GatedFusionCombiner
- Two numerical feature encoders: PLEEncoder and PeriodicEncoder
- Multi‑task loss balancing strategies (`log_transform`, `uncertainty`, `famo`, `gradnorm`) and Model Soup checkpoint averaging
Full changelog
Ludwig 0.12.0
Modernized Build System
- Migrated from
setup.py+ 10 requirements files to a singlepyproject.tomlwith hatchling - Dynamic versioning from
ludwig/globals.py - SafeTensors for secure, zero-copy model weight serialization (ECD models + training checkpoints)
- Added
torchcodecdependency (required by torchaudio 2.x)
Clean Config Layer
- Removed all marshmallow backward-compatibility layers (
@ludwig_dataclass,_SchemaAdapter,.Schema().load/dump()) - Renamed
BaseMarshmallowConfig->LudwigBaseConfig,DictMarshmallowField->NestedConfigField - Strict validation by default — unknown config fields now warn and are stripped
4 New Combiners
- FTTransformerCombiner (
type: ft_transformer): [CLS] token + Transformer self-attention (Gorishniy et al., NeurIPS 2021) - CrossAttentionCombiner (
type: cross_attention): Pairwise cross-attention between all feature pairs - PerceiverCombiner (
type: perceiver): Learnable latent bottleneck tokens (Jaegle et al., ICML 2022) - GatedFusionCombiner (
type: gated_fusion): Flamingo-inspired gated cross-modal fusion
Numerical Feature Tokenization
- PLEEncoder (
type: ple): Piecewise Linear Encoding with quantile bin edges (Gorishniy et al., NeurIPS 2022) - PeriodicEncoder (
type: periodic): Learned sinusoidal features
Multi-Task Loss Balancing (trainer.loss_balancing)
log_transform: log(1+loss) compression (DB-MTL)uncertainty: Homoscedastic uncertainty weighting (Kendall et al., CVPR 2018)famo: Fast Adaptive Multitask Optimization (Liu et al., NeurIPS 2023)gradnorm: Gradient normalization (Chen et al., ICML 2018)
Model Soup (trainer.model_soup)
Checkpoint weight averaging for better generalization at zero inference cost (Wortsman et al., ICML 2022)
Modality Dropout (trainer.modality_dropout)
Learnable missing-modality embeddings for robustness to missing inputs at inference
Quality Presets (preset: medium_quality|high_quality|best_quality)
AutoGluon-inspired one-line configuration for different quality/speed tradeoffs
Other
- Added
california_housingdataset toludwig.datasets - Simplified torchaudio calls (removed legacy backend version checks)
- Fixed SafeTensors shared-memory tensor handling for tied weights
Benchmark Results
| Model | Adult Census (AUC) | California Housing (RMSE) |
|-------|-------------------|--------------------------|
| ft_transformer | 0.919 | 0.461 |
| transformer | 0.918 | 0.469 |
| cross_attention | 0.916 | 0.477 |
| perceiver | 0.916 | 0.477 |
| concat (baseline) | 0.911 | 0.491 |
FT-Transformer matches paper-reported results within 0.2%.
- Minimum dask version bumped to 2026.1.2 in distributed requirements
Full changelog
Fixes
- Fix DDP checkpoint race condition: use
os.makedirs(exist_ok=True)to preventFileExistsErrorwhen multiple workers create the training checkpoints directory simultaneously - Fix dask metadata mismatch in
batch_predict: keepfrom_ray_dataset()insidetensor_extension_casting(False)context so partition dtypes match metadata during calibration - Pin minimum dask version to 2026.1.2 in distributed requirements
- Disable tensor extension casting in
batch_predictto fix dask metadata mismatch
- Upgrade to PyTorch 2.7.1 with bitsandbytes and CUDA compatibility fixes
Full changelog
What's Changed
Features
- Upgrade to PyTorch 2.7.1 with bitsandbytes and CUDA compatibility fixes
Bug Fixes
- Fix dask metadata mismatch in batch_predict by disabling tensor extension casting
- Fix dropout removal in sequence encoder gradient test
- Fix StackedCNN gradient test to pass with sparse updates
- Pin torchvision and torchaudio versions in CI to match torch 2.6.0
Maintenance
- Pre-commit suggestions (#4075)
- Optimized test suite reduces data sizes, model sizes, and epochs across integration tests.
- Reduced CI runtime by ~60-70%.
Full changelog
What's Changed
Bug Fixes
- Fix slow test failures: Checkpoint API, Dask-expr batch_transform, MinIO S3 storage compatibility
- Use fsspec s3fs for Ray Tune S3 storage (fixes PyArrow/MinIO chunked transfer encoding incompatibility)
- Fix automl best_model loading from Ray Tune checkpoints (use
from_checkpoint=True) - Fix test data sizes where stratified splits require minimum samples per class
Performance
- Optimize test suite: reduce data sizes, model sizes, and epochs across 14+ integration test files
- Reduce CI runtime by ~60-70%
Documentation
- Fix README image URLs to use correct branch
- Rewrite Korean README for v0.11
- Add JSON Schema export for SchemaStore integration
Full Changelog: https://github.com/ludwig-ai/ludwig/compare/v0.11.1...v0.11.2
- Removed `marshmallow` and `marshmallow-dataclass` dependencies; schema system now uses Pydantic 2
- Schema validation behavior changes due to migration from marshmallow to Pydantic 2
- Added timm encoder supporting MetaFormer variants (CAFormer, ConvFormer, PoolFormer)
- Added `trust_remote_code` support for custom HuggingFace models
Full changelog
What's Changed
Features
- Pydantic 2 migration: Migrated the entire schema system from marshmallow to pydantic 2 (#4070)
- Remove marshmallow dependency: Fully removed marshmallow and marshmallow-dataclass as dependencies (#4071)
- timm encoder: Added timm encoder with MetaFormer variants — CAFormer, ConvFormer, PoolFormer (#4063)
- trust_remote_code: Added support for custom HuggingFace models with
trust_remote_code(#4065)
Fixes
- LoRA save/load: Reordered
merge_and_unloadbefore save and handle merged weights on load (#4067) - Serve endpoint: Clip probabilities before log to prevent
-infin serve endpoint (#4064) - JSON-safe progress: Replace
float('inf')withsys.float_info.maxfor JSON-safe training progress (#4062) - Callback base class: Added
**kwargsto Callback base class for forward compatibility (#4066) - DevContainer: Rewrote devcontainer config for modern setup (#4068)
Chores
- Modernized stale configs, Docker images, CI, and docs (#4069)
- Optimized slow tests to reduce CI runtime (#4072)
- Minimum Python version increased to 3.10+; support for Python 3.8 and 3.9 removed
- Removed Horovod backend (replaced by Ray‑native distributed training)
- Removed Neuropod backend (project archived)
- PyTorch 2.6 with `F.scaled_dot_product_attention` for custom attention
- Ray 2.54 integration using lazy `ray.data.Dataset` execution (replaces legacy `DatasetPipeline`)
- Transformers 5.x, torchaudio 2.x, NumPy 2.x, Dask 2026.1.2 and MLflow 3.10 compatibility
Full changelog
Ludwig v0.11.0
Major modernization release bringing Ludwig up to date with the modern Python/PyTorch/Ray ecosystem.
Highlights
Platform & Dependencies
- Python 3.10+ — dropped support for Python 3.8 and 3.9
- PyTorch 2.6 with
F.scaled_dot_product_attentionfor custom attention - Ray 2.54 with modern
ray.data.Dataset(replaced legacyDatasetPipeline) - transformers 5.x, torchaudio 2.x, NumPy 2.x, Dask 2026.1.2
- MLflow 3.10 compatibility
Removed Backends
- Horovod — removed in favor of Ray-native distributed training
- Neuropod — removed (project archived)
- GBM (LightGBM) — removed to simplify the codebase
Architecture Changes
- Ray
DatasetPipeline→ lazyray.data.Datasetexecution - Custom attention uses
F.scaled_dot_product_attention(fixes CUBLAS errors on CUDA) - PyTorch profiler API updated to nanosecond precision (
start_ns/duration_ns) - Dask-expr compatibility (
dd.concat(), PyArrow string handling) - Ray Train 2.54 breaking changes handled (checkpoint-based metric reporting)
CI & Quality
- 3,266 tests passing across unit, integration, and distributed test suites
- Pre-commit hooks updated to latest versions (black 26, flake8 7.3, isort 8, mdformat 1.0)
- Comprehensive flake8 cleanup (removed all
# noqasuppressions for resolved issues)
Bug Fixes
- Fixed Ray backend pickling issues with
defaultdictlambda factories - Fixed
BatchInferModelGPU/CPU device handling - Fixed
NoneTrainermetric sync on head node - Fixed
LLM.to_device()device detection from actual parameters - Fixed audio preprocessing crash with torchaudio 2.x
- Fixed hyperopt
tune_callbackspassthrough
Installation
pip install ludwig==0.11.0
Or with all optional dependencies:
pip install ludwig[full]==0.11.0
Full Changelog
See the full diff for all changes.