burn

Model Serving & MLOps

Burn is a unified tensor library and deep‑learning framework that lets you train models in Python‑like dynamic code and run exactly the same code for production inference, all backed by fast Rust compilation.

Track releases GitHub Website

Rust Latest v0.21.0 · 2mo ago Security brief →

Features

Single API unifies training (dynamic graphs) and production inference without export loss
JIT‑compiles tensor operation streams with automatic kernel fusion for high performance
Rust‑based incremental compilation enables <5 s recompile cycles, mimicking Python’s rapid feedback loop

Recent releases

View all 7 releases →

v0.21.0 Breaking risk 2mo

⚠ Upgrade required

Update any code that directly accessed `~/.cache` for Burn datasets to call the new platform‑aware cache directory API.
Migrate existing binary model records (`BinFileRecorder`, `BinBytesRecorder`) to the new shape representation before loading with v0.21.0.
Adjust backend creation functions (e.g., `bool_empty`, `bool_into_int`) to accept an explicit output dtype argument.

Breaking changes

Dataset cache directory switched from hardcoded `~/.cache` to platform‑specific locations (`$XDG_CACHE_HOME`, `~/Library/Caches`, `{FOLDERPATH_LOCAL_APPDATA}`); code must use the new API.
`TensorData::shape` now stores a private `Shape` type instead of `Vec<usize>`; existing binary records using `BinFileRecorder` or `BinBytesRecorder` are not forward‑compatible and require conversion before upgrading.
Removed support for `powf` on integer tensors; operations must first cast to float (e.g., `tensor_int.float().powf(tensor_float)`).

Notable features

Added lightweight eager CPU backend **burn‑flex** for WebAssembly and embedded targets, replacing `burn-ndarray`.
Implemented early off‑policy reinforcement learning support in `burn‑rl` and related training utilities.
Introduced new kernel work: GEMV performance improvements, top‑k operations (`argtopk`), FFT implementations (rFFT/irFFT) with fusion support.

Full changelog

Summary

Burn 0.21.0 brings 4 months of improvements that make the framework significantly faster and more reliable across the board. The gains span distributed workflows for training large models all the way down to small-model inference, where the reduced framework overhead becomes especially noticeable.

We rethought our distributed computing stack around differentiable collective operations. Kernel selection is now more reliable thanks to better autotuning and a new validation layer, and a project-level burn.toml file lets you tweak those internals (and many others) without recompiling. A reworked device handle reduces framework overhead, and a new burn-dispatch crate simplifies backend selection while paving the way for faster compile times. The release also ships burn-flex, a lightweight eager CPU backend for WebAssembly and embedded targets that replaces burn-ndarray. Finally, we added early off-policy reinforcement learning support and a fresh round of kernel work on GEMV, top-k, and FFT.

For more details, check out the release post on our website.

Changelog

Breaking

We've introduced a couple of breaking changes with this release. The affected areas are detailed in the sections below.

`burn-dataset` cache directory

To respect platform conventions, we switched from using a hardcoded ~/.cache directory root for downloaded artifacts.

| Platform | Path |
|----------|------|
| Linux | $XDG_CACHE_HOME or ~/.cache |
| macOS | ~/Library/Caches |
| Windows | {FOLDERPATH_LOCAL_APPDATA} |

For Linux users without $XDG_CACHE_HOME configured, this change has no effect. The cache directory is still ~/.cache.

Interface Changes

TensorData::shape now stores a Shape instead of a Vec<usize>. Existing binary records using BinFileRecorder or BinBytesRecorder are no not forward-compatible and must be converted before upgrading.

static STATE_ENCODED: &[u8] = include_bytes!("model.bin");

let model: Model<B> = Model::new(&Default::default());

// Old format can still be loaded before upgrade, but must be re-saved in a forward-compatible format.
let record = BinBytesRecorder::<FullPrecisionSettings, &'static [u8]>::default()
    .load(STATE_ENCODED, &Default::default())
    .expect("Failed to decode state");
let model = model.load_record(record);

model.save_file("model.mpk", &NamedMpkFileRecorder::<FullPrecisionSettings>::new()).unwrap();

The module derive macro has been improved, and the Ignored<T> wrapper is now deprecated. For fields that should not considered modules, use #[module(skip)] instead.

pub struct Conv1d<B: Backend> {
-    pub padding: Ignored<PaddingConfig1d>,
+    #[module(skip)]
+    pub padding: PaddingConfig1d,
}

We added support for explicit asymmetric padding. If you were using explicit padding, you must now specify the same value for all pairs. Note that PaddingConfig3d does not support asymmetric padding yet.

// Symmetric (left, right)
- PaddingConfig1d::Explicit(1)
+ PaddingConfig1d::Explicit(1, 1)
// Symmetric (top, left, bottom, right)
- PaddingConfig2d::Explicit(1, 1)
+ PaddingConfig2d::Explicit(1, 1, 1, 1)

The Gelu activation module can now be configured with tanh approximation. This only affects code that instantiated Gelu directly.

- let activation = Gelu;
+ let activation = Gelu::new(); // or Gelu::default()

The position-wise feed-forward module now has a configurable activation function. To keep it backwards compatible with previously saved records, the field is marked as #[module(skip)].

#[derive(Module, Debug)]
pub struct PositionWiseFeedForward<B: Backend> {
    // ...
-   /// GELU activation function.
-   pub gelu: Gelu,
+   /// Activation function.
+   #[module(skip)]
+   pub activation: Activation<B>,
}

The Shape fields are now private and some methods have been renamed. ShapeError has been renamed to MetadataError.

- let b = tensor.shape().dims[0];
+ let b = tensor.shape()[0]

- if let Err(ShapeError::RankMismatch{...}) = lhs.broadcast(&rhs) {
+ if let Err(MetadataError::RankMismatch{...}) = lhs.broadcast(&rhs) {

- let shape = shape.swap(1, 2).unwrap();
+ let shape = shape.swapped(1, 2).unwrap();

- let shape = shape.permute(&[0, 2, 1, 3]).unwrap();
+ let shape = shape.permuted(&[0, 2, 1, 3]).unwrap();

The boolean data type was expanded to include its storage type.

match bool_tensor.dtype() {
-   DType::Bool => todo!(),
+   DType::Bool(BoolStore::Native) => todo!(),
+   DType::Bool(BoolStore::U8) => todo!(),
+   DType::Bool(BoolStore::U32) => todo!(),
    _ => unreachable!(),
}

powf is no longer supported for Int tensors, as it previously relied on incorrect implicit truncation. These operations are now only available for Float tensors.

- let tensor_i = tensor_int.powf(tensor_float);
+ let tensor_f = tensor_int.float().powf(tensor_float);

- let tensor_i = tensor_int.powf_scalar(scalar_float);
+ let tensor_f = tensor_int.float().powf_scalar(scalar_float);

Backend tensor creation and conversion ops now take an explicit output dtype. This removes backend-specific dtype inference and ensures consistent behavior across backends. (Backend implementors only.)

impl BoolTensorOps<Self> for MyBackend {
-    fn bool_empty(shape: Shape, device: &Device<Self>) -> BoolTensor<Self> {
+    fn bool_empty(shape: Shape, device: &Device<Self>, dtype: BoolDType) -> BoolTensor<Self> {
        // use `dtype` instead of inferring internally
    }
-    fn bool_into_int(tensor: BoolTensor<Self>) -> IntTensor<Self> {
+    fn bool_into_int(tensor: BoolTensor<Self>, out_dtype: IntDType) -> IntTensor<Self> {
        // use `dtype` instead of inferring internally
    }
}

Associated types were moved from Backend to BackendTypes. Prefer the type aliases (Device<B>, FloatTensor<B>, etc.) to avoid type resolution issues.

impl BoolTensorOps<Self> for MyBackend {
-    fn bool_empty(shape: Shape, device: &<Self as Backend>::Device, dtype: BoolDType)) -> <Self as Backend>::BoolTensorPrimitive {
+    fn bool_empty(shape: Shape, device: &Device<Self>, dtype: BoolDType) -> BoolTensor<Self> {
    }
}

Module & Tensor

Feat/device policy (#4373) @laggui
Implement basic RNN module (#4460) @aditya0by0
Add deg2rad and rad2deg (#4462) @softmaximalist
Implement median tensor operation (#4454) @softmaximalist
Add Selu activation function (#4439) @antimora
Add CELU activation function (#4441) @antimora
Add Elu activation function (#4438) @antimora
Add BiGru (bidirectional GRU) module (#4442) @antimora
Add ThresholdedRelu activation function (#4440) @antimora
Add Softsign activation function (#4437) @antimora
[Breaking] Add configurable activation and layer_norm_eps to transformer layers (#4410) @antimora
[Breaking] Add asymmetric padding support for conv and pool operations (#4263) @antimora
Implement HardShrink, SoftShrink and Shrink Activations (#4556) @aditya0by0
feat: add align_corners support to InterpolateOptions (#4518) @antimora
feat: support padding on arbitrary dimensions (#4507) @antimora
feat: enhance attention() with scale, attn_bias, softcap, and is_causal (#4476) @antimora
feat: Introduce Lanczos3 interpolation method (#4601) @ovr
Add HannWindow operator to burn-tensor (#4631) @walkinggo
[Breaking] Remove int powf and make powi numeric op (#4646) @laggui
[Breaking] Add bool store dtype + remove bool elem from fusion (#4649) @laggui
[Breaking] Use device settings to provide output dtype (#4653) @laggui
feat: add categorical sampling for tensors (#4655) @majiayu000
Add HammingWindow operator to burn-tensor (#4698) @RunjiaChen
Fix: make module cloning efficient for CPU devices (#4703) @antimora
feat: support cross-kind tensor casting via .cast() (#4713) @antimora
Add FloatInfo for dtype-aware precision info (#4721) @antimora
Fix unsqueeze_dims panic (#4755) @softmaximalist
Fix unsqueeze_dims panic on duplicate sorted axes (#4764) @antimora
feat(burn-nn): add native LocalResponseNorm module (#4765) @jcwal1516
Add det (determinant) tensor operation (#4813) @softmaximalist
Add Blackman window function to signal module (#4842) @softmaximalist
Add STFT/ISTFT and thread n through FFT backend trait (#4835) @antimora
Add linear op to ModuleOps for fused matmul+bias (#4747) @antimora
Add native impementations for scatter_nd / gather_nd; provide autodiff for assign & add (#4709) @cu9hue
Fix conv x-backward padding_out bug (#4806) @antimora
Extract float math ops in a new trait (#4891) @skewballfox
linalg::lu: Improve numerical handling and small perf cleanup (#4902) @softmaximalist
Adding complex to complex FFT implementation (#4903) @RunjiaChen
add autodiff for scatter_nd min/max/mul (#4909) @cu9hue
fix: conv_transpose x-backward output size (#4916) @SAY-5
Change pwff activation to #[module(skip)] for backward compat (stateless) (#4929)

Datasets & Training

Implement SSIM vision metric (#4396) @softmaximalist
add KLDivLoss and batch_mean in reduction (#4399) @donjuanplatinum
Fix cubek matmul stage size (#4435) @laggui
Implement the PSNR vision metric (#4379) @softmaximalist
Implement Mean(L(P) Norm Error)Loss (#4341) @softmaximalist
Feature flag + Tests for RL in burn-rl and burn-train (#4470) @Charles23R
Burn rl (#4447) @Charles23R
add AMSgrad support for Adam/AdamW (#4388) @donjuanplatinum
add LBFGS optimizer (#4471) @donjuanplatinum
Add SequenceOutput struct for sequence prediction outputs (#4474) @softmaximalist
fix: OptimSharded strategy validation device mismatch (#4527) @Dreaming-Codes
Implement CTC loss (#4529) @softmaximalist
Add Smooth L1 loss (#4547) @softmaximalist
Implements: LPIPS matrics for Image quality (#4403) @koreaygj
feat: Implements DISTS metric (#4574) @koreaygj
Add multi-scale SSIM for image quality assessment (#4555) @softmaximalist
Add Gram Matrix Loss for vision tasks (#4595) @softmaximalist
Add evaluator summary (#4578) @laggui
Fix cosine scheduler record in composed scheduler (#4617) @laggui
Implement RNNT loss (#4623) @cong-or
feat: add FID vision metric (#4644) @cong-or
Add Adan optimizer implementation with tests (#4651) @sepcnt
[Breaking] Split TrainingStrategy to decouple the DistributedBackend requirement (#4710) @laggui
Fix CrossEntropyLoss with probabilities (#4829) @laggui

Backends

More explicit global dtype support (#4400) @laggui
opt(burn-cubecl): Optimized tensors by default (#4402) @wingertge
Add device dtype usage (#4404) @laggui
Attention: add autotune gate (#4554) @louisfd
Attention autotune (#4552) @louisfd
Attention: remove default impl and implement for all backends (#4544) @louisfd
Add native sign unary ops for CubeCL float and int (#4513) @yash27-lab
[Feat] Global backend Dispatch (#4508) @laggui
allow flash attention with causal (#4509) @louisfd
Perf: Improve fusion score (#4511) @nathanielsimard
Dispatch autodiff checkpointing strategy support (#4629) @laggui
Selector/attention (#4648) @louisfd
update cubek and fix vecmat autotune (#4682) @louisfd
update cubek and cubecl (#4699) @louisfd
update cubek & fix gemv autotune (#4726) @louisfd
Feat/add rfft (#4707) @Sublime12
Feat/add irfft (#4719) @Sublime12
Feat/implement fusion for rfft (#4735) @Sublime12
Feat/implement fusion for irfft (#4736) @Sublime12
Add burn-flex CPU backend (#4761) @antimora
burn-flex: enable f16 tests and fix mean overflow, grid_sample and quantization (#4769) @antimora
Add softmax and layer_norm backend trait hooks (#4797) @antimora
burn-flex: implement softmax and layer_norm backend op (#4805) @antimora
Matmul selection (#4773) @nathanielsimard
Add native dispatch overrides and native tch ops for softmax, layer_norm (#4834) @antimora
[Breaking] Split Associated Types from Backend into BackendTypes (#4868) @skewballfox
Add ctc_loss backend trait hook + tch and cubecl impls (#4819) @antimora
Update CubeK: tile matmul refactor (#4901) @louisfd
Add argtopk for Cubecl backend (#4900) @Sublime12
Add fusion integration for argtopk (#4904) @Sublime12
Add cubecl integration to topk (#4906) @Sublime12
Fusion tests (#4872) @nathanielsimard
Enable & fix cubecl tests w/ fusion (#4917) @laggui

Bug Fixes

Fix reduce line size parallel and mean accumulator precision (#4467) @laggui
fix: default to single device strat when only 1 device (#4463) @Charles23R
fix: use all dilation entries in max_pool2d_with_indices_backward (#4466) @fcasal
Fix cubek matmul stage size (#4435) @laggui
fix: Fix interpolate with NHWC input (#4363) @wingertge
fix: Actually implement conv backwards ops for burn-fusion/burn-router (#4360) @wingertge
Fix memory growth: use GraphLocator::remove_entry for orphan cleanup (#4342) @jnamika
fix: Bool from_data_dtype panics on GPU backends (#4551) @antimora
fix: resolve macOS build and test failures (#4545) @antimora
Fix too many kernels (#4505) @nathanielsimard
Fix quantization non-contiguous input (#4498) @laggui
fix overflow in int_abs_elem for i64 min value (#4486) @Olexandr88
Fix: create multiple elemwise fused block (#4497) @nathanielsimard
Fix fusion cumulative op inputs (#4621) @laggui
Fix dispatch autodiff feature propagation (#4592) @laggui
Fix conv2d_weight_backward w/ strided channels and unit spatial dims (#4591) @laggui
Fix(lpips): load ImageNet backbone weights for pretrained models (#4557) @koreaygj
Fix tch int_zeros dtype in sync (#4664) @laggui
Fix fusion kernel vector_size mismatch on f16 output writes (#4675) @AdrianEddy
Fix fusion consistency checks and binding estimation (#4695) @nathanielsimard
Fix attention_fallback NaN for fully-masked rows (#4697) @antimora
fix output in attention tuner (#4702) @louisfd
fix: use integer arithmetic for nearest-neighbor coordinate scaling (#4687) @wkrettek
Fix cubecl cuda all-reduce + remove useless check in distributed server (#4720) @Charles23R
Fix fusion scalar broadcasting in write_output_aligned (#4741) @laggui
Fix quantization tests and flaky tolerance (#4743) @laggui
Fix select_assign OOB (#4760) @nathanielsimard
Fix burn-flex bool binary ops to broadcast operands (#4775) @antimora
Fix burn-flex attention rejecting broadcasted mask/bias (#4777) @antimora
fix(ndarray): grouped conv SIMD clamp + regressions (#4727) @dnvt
Fix autotune context, remove unsafe code (#4781) @ArthurBrussee
Fix cubecl cross product on non-last dimension (#4850) @dschulmeist
Fix burn-flex to_contiguous fast path for prefix views (#4856) @antimora
Fix burn-flex sum_dim reading contiguous storage on transposed input (#4861) @antimora
Fix burn-flex argmax NaN ordering; tighten expand; precise erf (#4859) @antimora
Fix fusion reduce broadcasted when multi block local might be a view (#4867) @laggui
Fix select_assign OOB units (#4870) @laggui
Update cubecl + cubek: fix matmul, reduce WASM and vector size check on strided tensors (#4874) @laggui
Fix fusion read_quantized native type (#4923) @laggui

Documentation & Examples

Update Burn Book: metrics and trig functions (#4413) @softmaximalist
docs: add DataframeDataset example using Polars (#4298) @SameerVers3
doc(notebook) : add more basic operations and some examples (#4542) @Tyooughtul
Update documentation link for burn-store (#4619) @softmaximalist
Update building-blocks chapter (#4625) @softmaximalist
Update ONNX import docs for LoadStrategy and from_bytes (#4607) @antimora
Use burn-flex in docs and examples (#4841) @antimora

Fixes

Add field docs to generated methods (#4408) @swfsql
Fix typo in dataset.md in Burn Book (#4380) @softmaximalist
Fix book guide training changes (#4340) @laggui
Fix image-classification-web links (#4536) @laggui
fix: replace ValidStep with InferenceStep in training.md (#4620) @TsaoLun

Enhancements

Add module.train() to move a module back to the autodiff backend (#3975) @laggui
Perf/fusion/reduce broadcasted (#4338) @nathanielsimard
feat: Enable 64-bit indexing for kernels (#4502) @wingertge
Refactor/device handle (#4593) @nathanielsimard
All reduce backward (#4650 #4873) @Charles23R
Perf/burn fusion overhead (#4645) @nathanielsimard
Device service usage (#4839) @nathanielsimard

Refactoring

Add Scalar runtime literal (#4337) @laggui
Move ONNX crates to burn-onnx repository (#4393) @antimora
chore: Update cubecl to runtime config refactor (#4489) @wingertge
chore: deprecate burn-candle backend (#4416) @antimora
Move ONNX import to burn-onnx crate (#4361) @laggui
[Breaking] perf: Make backing storage of Shape more flexible (#4516) @wingertge
refactor: Move from CubeOption to Option (#4543) @wingertge
[Breaking] refactor: Metadata type/strides refactor (#4534) @wingertge
Use shape in TensorData (#4603) @laggui
refactor: Vector size generic (#4624) @wingertge
refactor: View launch (#4639) @wingertge
Refactor backend tests to set device settings at initialization + use Dispatch (#4666) @laggui
Prep for Group Multi Optimizers (#4818) @crutcher
Cleanup OptimizerAdaptor / GradAdaptor API. (#4822) @crutcher
Remove unused M param from SimpleOptimizerMapper. (#4823) @crutcher
Move tensor tests from burn-flex to burn-backend-tests (#4812) @antimora
Fusion all reduce + refactor collective (#4803) @Charles23R
Migrate benchmarks from burn-flex to burn-backend-tests (#4853) @antimora
Migrate default test backend from NdArray to Flex (#4854) @antimora
Update cubecl: refactor toml config, fix autotune priority and fix persistent memory pool reset (#4858) @nathanielsimard
Add burn-std::config runtime configuration with fusion logging and search optimization (#4864) @nathanielsimard
Update/cubecl to client (#4866) @Charles23R
Centralize internal burn-* deps in [workspace.dependencies] (#4876) @antimora
Remove optim::optim (#4924) @crutcher

Miscellaneous

Update zip + time (#4468) @laggui
Update cubecl wgpu v28 (#4244) @laggui
[Breaking] Use cache_dir() instead of hardcoded ~/.cache path (#4372) @antimora
Make ElementComparison optional for dtypes (#4255) @skewballfox
Performance tweaks to the lp_norm code. (#4318) @crutcher
ensure that tensor is owned on iter_dim call (#4309) @tzemanovic
Use NodeType to point to unimplemented node (#4334) @laggui
Bump burn version 0.21 (#4333) @laggui
feat(burn-store): add ModuleAdapter chaining (#4407) @huahuadeliaoliao
Replace Vec-based TransitionBuffer with tensor-backed storage (#4504) @arferreira
Optional Ordering for NdArrayElement (#4559) @skewballfox
Move burn-nn module name checks in burn-store adapter to the test section (#4580) @softmaximalist
Expose BurnpackError (#4585) @AdrianEddy
Add HalfPrecisionAdapter for F32/F16 mixed-precision storage (#4594) @antimora
Improve module derive + add #[module(skip)] attribute (#4618) @laggui
Fix SSIM float types to f32 (#4602) @softmaximalist
Fix function arg name inconsistencies (#4626) @softmaximalist
Make Param Sync for parallel model inference (#4701) @antimora
Fix flaky initializer_normal_init test (#4766) @leohenon
Add Record<(R0,)> 1-Tuple (#4825) @crutcher
Display FlexDevice as Cpu (#4857) @antimora
Fix rustls-webpki audit (#4863) @laggui
Fix PytorchReader bugs to load legacy files correctly (#4897) @softmaximalist
Add Clone + 'static bounds to LrScheduler::Record and derive Clone for scheduler records (#4905) @crutcher
Add ParamId::try_deserialize() (#4881) @crutcher
Use gather_nd in RNN-T gather_loss (#4895) @antimora
Re-enable fusion f16 conv + bn regression tests (#4920) @laggui
rnnt.rs: Optimize extract_log_probs and init_alpha (#4922) @softmaximalist
Fix some test tolerances (#4926) @laggui

Full Changelog: https://github.com/tracel-ai/burn/compare/v0.20.0...v0.21.0

View release on GitHub

v0.20.1 Bug fix 6mo

Fixed book guide training changes and removed dequantize native debug statements.

Full changelog

Bug Fixes & Improvement

Fix book guide training changes (#4340) @laggui
Fix dequantize native debug statement (https://github.com/tracel-ai/cubek/pull/69) @laggui
Do not point to pinned exact versions to allow pulling patch releases @laggui

View release on GitHub

v0.20.0 Breaking risk 6mo

Breaking changes

Replaced `LearnerBuilder` with `SupervisedTraining::new(...).summary()` flow in the training API.
Added required `IndexingUpdateOp` argument to `scatter` and `select_assign` operations.
Removed const generic `D2` from slice / SliceArg APIs (now use dimension‑agnostic forms).

Notable features

Complete overhaul of the ONNX import system with support for new control flow operators (`If`, `Loop`, `Scan`) and memory‑mapped loading.
Integration of [`CubeCL`](https://github.com/tracel-ai/cubecl/) for unified CPU/GPU kernel performance across diverse hardware.

Full changelog

Summary

This release marks a major turning point for the ecosystem with the introduction of CubeK. Our goal was to solve a classic challenge in deep learning: achieving peak performance on diverse hardware without maintaining fragmented codebases.

By unifying CPU and GPU kernels through CubeCL, we've managed to squeeze maximum efficiency out of everything from NVIDIA Blackwell GPUs to standard consumer CPUs.

Beyond performance, this release makes the library more robust, flexible, and significantly easier to debug.

This release also features a complete overhaul of the ONNX import system, providing broader support for a wide range of ONNX models. In addition, various bug fixes and new tensor operations enhance stability and usability.

For more details, check out the release post on our website.

Changelog

Breaking

We've introduced a couple of breaking API changes with this release. The affected interfaces are detailed in the sections below.

Training

We refactored burn-train to better support different abstractions and custom training strategies. As part of this,
the LearnerBuilder has been replaced by the LearningParadigm flow:

- let learner = LearnerBuilder::new(ARTIFACT_DIR)
+ let training = SupervisedTraining::new(ARTIFACT_DIR, dataloader_train, dataloader_valid)
        .metrics((AccuracyMetric::new(), LossMetric::new()))
        .num_epochs(config.num_epochs)
-       .learning_strategy(burn::train::LearningStrategy::SingleDevice(device))
-       .build(model, config.optimizer.init(), lr_scheduler.init().unwrap());
+       .summary();
 
- let result = learner.fit(dataloader_train, dataloader_valid);
+ let result = training.launch(Learner::new(
+      model,
+      config.optimizer.init(),
+      lr_scheduler.init().unwrap(),
+ ));

Interface Changes

The scatter and select_assign operations now require an IndexingUpdateOp to specify the update behavior.

- let output = tensor.scatter(0, indices, values);
+ let output = tensor.scatter(0, indices, values, IndexingUpdateOp::Add);

API calls for slice, slice_assign, and slice_fill no longer require const generics for dimensions, which cleans up the syntax quite a bit:

- let prev_slice = tensor.slice::<[Range<usize>; D]>(slices.try_into().unwrap());
+ let prev_slice = tensor.slice(slices.as_slice());

The grid_sample_2d operation now supports different options.
To preserve the previous behavior, make sure to specify the matching options:

- let output = tensor.grid_sample_2d(grid, InterpolateMode::Bilinear);
+ let options = GridSampleOptions::new(InterpolateMode::Bilinear)
+     .with_padding_mode(GridSamplePaddingMode::Border)
+     .with_align_corners(true);
+ let output = tensor.grid_sample_2d(grid, options);

The QuantStore variants used in QuantScheme have been updated to support a packing dimension.

  pub enum QuantStore {
      /// Native quantization doesn't require packing and unpacking.
      Native,
+     /// Store packed quantized values in a natively supported packing format (i.e. e2m1x2).
+     PackedNative(usize),
      /// Store packed quantized values in a 4-byte unsigned integer.
-     U32,
+     PackedU32(usize),
 }

Finally, Shape no longer implements IntoIterator. If you need to iterate by-value over dimensions, access the dims field directly.

- for s in shape {
+ for s in shape.dims {

Module & Tensor

Generalize linalg::outer semantics; add linalg::outer_dim (#3923) @crutcher
Use square() where appropriate. (#3900) @crutcher
Add linalg matvec (#3967) @huy209vn
Add GaussianNoise layer (#4022) @kul-sudo
Make TransformerEncoderLayer fields public (#4053) @Mnwa
Workaround MPS embedding allocation error in LibTorch (#4073) @antimora
Fix Slice operation to handle empty ranges (#4083) @antimora
Handle empty tensors in cat and slice_assign ops (#4095) @antimora
[Breaking] Add IndexingUpdateOp to scatter and select_assign (#4070) @laggui
Add CrossAttention module to burn-nn (#4101) @huy209vn
Add reflect and edge padding modes to tensor.pad (#4105 #) @antimora
Fix GLU and quiet softmax activations (#4121) @laggui
Add ceil_mode support to pooling operations (MaxPool, AvgPool) (#4112) @antimora
[Breaking] Remove D2 const generic from slice / SliceArg (#4127) @crutcher
Add backend supports_dtype (#4155) @laggui
Fix repeat 0 times (#4216) @laggui
feat: add hardswish activation (#4209) @mertalev
Add more trig ops (#4282) @laggui
Add empty/zeros/ones/full TensorCreationOptions (#4285) @laggui
feat: nms op (#4246) @mertalev

Datasets & Training

Refactor metric loggers(#3895 #4017) @Charles23R
Add support for custom learning strategy (#3921) @Charles23R
Feat/optim/distributed (#4018) @nathanielsimard
Refactor MetricEntry (#4031) @Charles23R
Feature muon (#3925) @NewBornRustacean
Add warmup epochs to MetricEarlyStoppingStrategy (#4041) @crutcher
Log running values (#4199) @Charles23R
Fix checkpoint and summary log level (#4201) @J-F-Liu
[Breaking] Burn train api refactor (#4223 #4283) @Charles23R
Fix checkpointer interrupt (#4268) @Charles23R

Backends

Add candle device seeding (#3959) @laggui
feat: Enable tuning for MMA matmul (#3961) @wingertge
feat: TMA autotuning (#3986) @wingertge
feat: Enable tuning specialized matmul (#4026) @wingertge
Add CubeCL Flash Attention module (#4089 #4192) @louisfd
Zero-copy tensor loading for NdArray backend (#4178) @antimora
feat: Implicit GEMM weight gradients for convolution (#4182) @wingertge
Perf/reduce cpu + Fix OOB (#4197 #4204) @nathanielsimard
feat: Accelerated convolution data gradient (#4220) @wingertge
Remove linux-only constraint for cpu (#4233) @louisfd
Perf/into contiguous (#4257) @nathanielsimard
fix: grid sample using excessive memory (#4236 #4242) @mertalev
Add fast-path for batched vector–matrix matmul (#4300) @louisfd

Bug Fixes

Fix async barrier & TMA checks (#4007) @nathanielsimard
Fix fusion reduce local already registered as output (#4014) @laggui
Fix remainder int (#4015) @laggui
Fix cuda mem error (#4020) @nathanielsimard
Cleanup autodiff unused roots (#4039) @laggui
Fix autotuner (#4049) @nathanielsimard
Fix scatter values backward (#4064) @khoek
More correctness fixes in autodiff ops (#4069) @khoek
Fix transaction read (#4074) @laggui
Fix tch bf16 kind (#4088 #4142 #4203) @laggui
Fix cubecl cuda compilation error/typo (#4092) @BjornTheProgrammer
Fix output dtype for argmin / argmax (#4195) @tzemanovic
Return slice for each dimension in shape (#4152) @laggui

Documentation & Examples

Update raspberry pi pico example (#4034 #4132) @BjornTheProgrammer
Contributor Book: Update the "ONNX to Burn" Page (#4229) @softmaximalist
docs: add examples for bool tensor operations (#4248) @qburke
Update the "Adding New Operation" guide in the contributor book (#4284) @softmaximalist
Refactor dop_timer for multiple trials (for warmup). (#4288) @crutcher
Added documentation examples for more boolean tensor operations in burn-tensor (#4289) @qburke

Fixes

Fix book (#3942) @laggui
remove repetitive words in comment (#4029) @black5box
Include katex header as symlink (#4118) @laggui
Fix quantization docs (make it clear that only PTQ is currently supported) (#4316) @laggui

ONNX Support

ONNX IR and import refactor to better support complex graphs (#3872 #4019 #4033 #4094) @antimora
Add ONNX control flow operators: If, Loop, and Scan (#3936) @antimora
Silero VAD ONNX model verification (#3999) @antimora
Add support for yolo12x model variant (#4048) @antimora
Remove burn-import abstraction layer and use onnx-ir types directly (#4033) @antimora
Fix ConstantOfShape output size determination (#4085) @antimora
Specify output rank in squeeze_dims for type inference (#4086) @antimora
Fix Expand operation to use ONNX max-semantics (#4082) @antimora
[Breaking] Add ONNX GridSample op support and tests (#4084) @antimora
Add RF-DETR model check for burn-import (#4087) @antimora
Add LSTM operator support with configurable activations (#4106) @antimora
Add memory-mapped ONNX loading with tensor data ref (#4097) @antimora
Fix outer-scope variable references in ONNX subgraphs (If/Loop/Scan) (#4119) @antimora
Add Reshape scalar optimization and Gather scalar input support (#4146) @antimora
Update GELU ONNX test to use native op and fix expected values (#4161) @antimora
Add ONNX CumSum operator support (#4162) @antimora
Remove global ONNX opset version restriction, recommend opset 16 (#4168) @antimora
Handle 1D slope when importing prelu from onnx (#4205) @mertalev
Fix handling scalar scan outputs in ONNX loop nodes (#4210) @antimora
Add ONNX external data support for models >2GB (#4158) @antimora
fix: handle negative indices in onnx gather op (#4207) @mertalev
Split backend tensor ops tests (#4232) @laggui
Do not use alloc import in burn-import codegen (#4286) @laggui
Fix ONNX where broadcasted dims (#4315) @laggui

Enhancements

Feat/pinned memory staging (#4016) @nathanielsimard
burn-store enhancements for troubleshooting and new enum skip flag (#4051) @antimora
Feat/runtime error (#4079 #4110) @nathanielsimard
Perf/improve reduce autotuning + plane non uniform control flow check (#4208) @nathanielsimard
Packed quantized matmul with QuantStore changes (#4310 #4323) @wingertge

Refactoring

chore: Update to batch caching PR for cubecl (#3948) @wingertge
Refactor IR to define outputs as a function of the operation (#3877) @laggui
Chore/update dtypes (#3998) @nathanielsimard
Cleanup quantization strategy (CPU ref, ndarray only) (#4023) @laggui
Refactor/dtype cubecl (#4032) @nathanielsimard
Refactor of burn fusion and burn cubecl fusion (#4044) @nathanielsimard
chore: Update to cubecl scalar refactor (#4062) @wingertge
refactor: cubecl Runtime trait (#4065) @wingertge
Refactor/autotuner (#4068) @nathanielsimard
Move types from burn-tensor to burn-std and burn-backend (#4050) @laggui
Feat/error handling cubecl (#4076) @nathanielsimard
Refactor RemoteDevice and RemoteSender. (#4113 #4108) @crutcher
Refactor LocalCollectiveClient and LocalCollectiveServer (#4125 #4126) @crutcher
Move backend traits and types to burn-backend (#4111) @laggui
Migrate ONNX import to burnpack format (removing Record type) (#4122) @antimora
Refactor more basic ops (#4156) @laggui
Refactor configurable backend tests (no more testgen macros) (#4129) @laggui
Backends no longer depend on burn-tensor, but strictly burn-backend (#4169) @laggui
Refactor/cube dim (#4217) @nathanielsimard
Update ops subfolder file names (#4271) @softmaximalist
refactor: Migrate to usize indexing (#4273) @wingertge
Unify ReshapeArgs / Shape.reshape(args) (#4221 #4317) @crutcher @laggui
chore: Update to refactor cubecl types and traits (#4297) @wingertge

Miscellaneous

Add Shape::ravel_index for row-major raveling of indices. (#3879) @crutcher
ci: let CI server dispatch the test-gpu workflow (#3938) @syl20bnr
ci: check tag version against Cargo.toml version before publishing (#3939) @syl20bnr
Implement error for DataError (#3960) @laggui
Pin burn crates version (#4035) @Marc-AnthonyG
Implement FromStr for Slice with parsing and error handling (#3983) @crutcher
Enable no-std SafeTensors support and update hashbrown (#4071) @antimora
Move network utilities to burn-std (#4104) @laggui
Add 256-byte tensor alignment to burnpack format for mmap zero-copy support (#4100) @antimora
Fix/autotune checks (#4114) @nathanielsimard
Add direct tensor snapshot retrieval API to ModuleStore (#4131) @antimora
Implement Slice iterator and utility methods. (#4042) @crutcher
Shape FromStr/ToString (#4143) @crutcher
Add contiguous index mapping for non-contiguous layer indices (#4150) @antimora
Zero-copy loading for embedded burnpack weights (#4154) @antimora
Add flatten_dims method to Shape and refactor tensor flattening API (#4189) @crutcher
Make xtask validate run no-std checks first. (#4198) @crutcher
Add tracing::instrument and refactor collective operations. (#4157 #4234) @crutcher
Fix dtype preservation when loading tensors in burn-store (#4194) @antimora
Fix burn-store quantized tensor storage data length calculation (#4180) @antimora
Replace canonicalize_dim with expect_dim (#4196) @crutcher
Refactor: Consolidate shape and slice error handling into ExpressionError (#4218) @crutcher
Implement TODO tests and validation for Sum operation in onnx-ir (#4251) @softmaximalist
Fix burn-store collector tuple modules (#4270) @laggui
Fix rand os_rng (#4295) @laggui
chore: update xtask to 4.9.0 (#4311) @syl20bnr

View release on GitHub

v0.19.1 Bug fix 8mo

Fixed a pickle reader regression that prevented integer dictionary keys from being unpickled correctly.

Full changelog

Bug Fixes & Improvements

Autodiff: fixed RAM memory leak with correct graph cleanup (#3957 #3982) @laggui
Better memory reuse: improved sliced memory pool implementation (#3941) @nathanielsimard
Cuda: update cudarc, auto-detect CUDA version and fix some 12.8 features (CubeCL #1008) @wingertge
Quantized Linear: fixed fusion configuration to fuse more precisions (#3941) @nathanielsimard
PyTorch import: fixed pickle reader regression with integer dictionary keys (#3978) @laggui
Docs: switched to doc_cfg to fix docs.rs builds (#3979) @laggui
Tensor API fixes:
- *_like preserves dtype (#3953) @crutcher
- RotaryEncoding sum dimension for 3D input (#3954) @laggui
- squeeze check for output rank > 0 (#3946) @laggui
- Linear for input/output rank 1 (#3966) @lucasmdjl

View release on GitHub

v0.19.0 Breaking risk 9mo

Breaking changes

.devices(vec![device.clone()]) → .learning_strategy(LearningStrategy::SingleDevice(device.clone()))
`let model_trained = learner.fit(...)` now returns a `TrainingResult` instead of the trained model directly; access via `result.model` and use `result.renderer` for metrics.
Config trait now requires `Debug` implementation.

Notable features

Multi-stream execution and optimized device transfers enable true multi‑GPU parallelism.
New CPU backend based on MLIR/LLVM providing JIT compilation, autotuning and fusion on CPUs.
Comprehensive quantization support with fused dequantization and new quantized operations.

Full changelog

Summary

This release brings major improvements to enable efficient distributed training, quantization, and CPU support in Burn.

To achieve true multi-GPU parallelism, we had to rethink several core systems: we implemented multi-stream execution to keep all GPUs busy, optimized device transfers to avoid unnecessary synchronization, and redesigned our locking strategies to eliminate bottlenecks in autotuning, fusion, and autodiff. We also introduced burn-collective for gradient synchronization and refactored our training loop to support different distributed training strategies.

Additionally, we added comprehensive quantization support, allowing models to use significantly less memory while maintaining performance through fused dequantization and optimized quantized operations.

Finally, we introduced a new CPU backend powered by MLIR and LLVM, bringing the same JIT compilation, autotuning, and fusion capabilities from our GPU backends to CPU execution.

As with previous releases, this version includes various bug fixes, further optimizations and enhanced documentation. Support for ONNX models has also been expanded, with additional operators and bug fixes for better operator coverage.

For more details, check out the release post on our website.

Changelog

Breaking

We've introduced a couple of breaking API changes with this release. The affected interfaces are detailed in the sections below.

Learning Strategy

We refactored the Learner to support better distributed training strategies. Instead of registering a list of device(s), you now specify a training strategy.

  let learner = LearnerBuilder::new(artifact_dir)
      .metric_train_numeric(AccuracyMetric::new())
      .metric_valid_numeric(AccuracyMetric::new())
      .metric_train_numeric(LossMetric::new())
      .metric_valid_numeric(LossMetric::new())
      .with_file_checkpointer(CompactRecorder::new())
-     .devices(vec![device.clone()])
+     .learning_strategy(LearningStrategy::SingleDevice(device.clone()))
      .num_epochs(config.num_epochs)
      .summary()
      .build(
          config.model.init::<B>(&device),
          config.optimizer.init(),
          config.learning_rate,
      );

Learner Training Result

The Learner previously lacked an evaluation loop. We extended its return type to include all training states in a TrainingResult, which includes the trained model and a metrics renderer.

- let model_trained = learner.fit(dataloader_train, dataloader_valid);
+ let result = learner.fit(dataloader_train, dataloader_valid);

- model_trained
+ result
+    .model
     .save_file(format!("{artifact_dir}/model"), &CompactRecorder::new())
     .expect("Trained model should be saved successfully");

This enables the renderer to be reused by the new evaluator so that training and evaluation metrics appear together in the TUI dashboard:

let mut renderer = result.renderer;
let evaluator = EvaluatorBuilder::new(artifact_dir)
    .renderer(renderer)
    .metrics((AccuracyMetric::new(), LossMetric::new()))
    .build(result.model.clone());

evaluator.eval(name, dataloader_test);

Interface Changes

`Config`

The Config trait now requires Debug:

- #[derive(Config)]
+ #[derive(Config, Debug)]
  pub struct TrainingConfig {
      // ...
  }

`BatchNorm`

BatchNorm no longer requires the spatial dimension generic:

  #[derive(Module, Debug)]
  pub struct ConvBlock<B: Backend> {
      conv: nn::conv::Conv2d<B>,
-     norm: BatchNorm<B, 2>,
+     norm: BatchNorm<B>,
      pool: Option<MaxPool2d>,
      activation: nn::Relu,
  }

`Backend::seed`

Seeding is now device-specific:

- B::seed(seed);
+ B::seed(&device, seed);

`Tensor`

For consistency with other methods like unsqueeze() / unsqueeze_dim(dim), squeeze(dim) was renamed:

- tensor.squeeze(dim)
+ tensor.squeeze_dim(dim)

We've also added a tensor.squeeze() method which squeezes all singleton dimensions.

Finally, we removed tensor ^ T syntax, which was clunky.

- use burn::tensor::T;
- tensor ^ T
+ tensor.t()

tensor.t() is also a simple alias for tensor.transpose().

Module & Tensor

Fix unsqueeze rank check (#3429) @laggui
Feat/quant block (#3442) @laggui
Kill tensor^T magic transpose marker in favor of tensor.t(). (#3452) @crutcher
ADD GLU activation function (#3444) @bn-c
Add quantization params precision (#3453) @laggui
Improve select_assign check (#3483) @laggui
Add grid_sample function (#3495 #3523 #3522) @Cielbird
save_tensor_as_image utility (#3520) @Cielbird
Add affine_grid_2d (#3526) @Cielbird
ADD missing Debug derive for embedding (#3547) @bn-c
Dot Product Op (#3537) @kikefdezl
Lift .full()/.full_like() into base Tensor - support Tensor<B, D, Bool>::full()/full_like(). (#3562) @crutcher
Make Distribution::Default the Default::default(). (#3582) @crutcher
Implement int matmul (#3575) @wingertge
Feat/quant formats (#3613) @laggui
Switch Tensor::swap_dims/permute to AsIndex dim support. (#3619) @crutcher
Tensor::flatten() => AsIndex dims support. (#3620) @crutcher
Remove D param from BatchNorm<B, D>. (#3625) @crutcher
nn.activation; Activation (#3603 #3693) @crutcher
Add q4 q2 quantization (#3617) @laggui
Introduce NormLayer abstraction for unified normalization layers. (#3630) @crutcher
Add dtype to trait creation ops (#3670) @laggui
Make Config require Debug (#3689) @crutcher
Add NormalizationConfig::with_num_features() and related (#3688) @crutcher
Module quantization w/ tests (#3637) @nathanielsimard
Add NumPy-like take operation with multi-dimensional index support (#3681) @antimora
Added trace and diag with batch support for linalg crate (#3703) @niklund
Add step support to tensor slice operations (#3748) @antimora
Tensor::unfold(dim, size, step) (#3751 #3782 #3783) @crutcher
Slice assign with steps (#3776) @antimora
Add bool_xor operation for boolean tensors (#3785) @crutcher
[Breaking] Make squeeze/squeeze_dim consistent with other APIs (#3790) @laggui
Add cross product (#3743) @SinanGncgl
Enable stepped slicing for slice_fill and complete slice API cleanup (#3784) @antimora
Tensor::rank() (#3797) @crutcher
AsIndex dim handling for Numeric ops (#3795) @crutcher
Add outer and outer_batch ops in linalg (#3786) @huy209vn
Tensor::_dims() (#3811) @crutcher
Add tensor.cumsum(dim) first implementation (#3806) @antimora
slice_fill() should pick a compatible dtype (#3826) @crutcher
Default LU decomposition implementation (#3816) @DimitriTimoz
Add tensor.square and fast-path int-power exponents. (#3847) @crutcher
Add cumulative operations: cumprod, cummin, and cummax (#3819) @antimora
Add Tensor::sum_dims_squeeze(dims) (#3817) @crutcher
Allow linear to use quantized matmul (#3913) @wingertge

Datasets & Training

Pre-Shuffle Multithread DataLoaders on Shuffle (#3390) @crutcher
PixelDepth + Copy (#3419) @crutcher
Add Dice-Sorenson Coefficient Metric (#3407) @MathijsdeBoer
Add SelectionDataset, refactor ShuffledDataset, and add transform tests. (#3406) @crutcher
Evenly distribute complete chunks/batches across partial dataset splits (#3476) @laggui
Distributed Data Parallel (#3456) @Cielbird
Use tensor ops for clip_by_norm (#3485) @laggui
SamplerDataset distribution fix; constructors and builder. (#3490) @crutcher
Unify transform usage of RngOptions. (#3577) @crutcher
Fix bugs with ddp learning (#3581) @Cielbird
Add support for CIFAR-10 and CIFAR-100 datasets (#3579) @buttfa
Add with_interrupter for LearnerBuilder (#3611) @amfaber
Improved Burn Train (#3614 #3935) @nathanielsimard @laggui
Add 'TextFolderDataset' struct and AgNewsDataset (#3698) @buttfa
Add PerplexityMetric for language model evaluation (#3707) @TheDarkchip
Adding CER/WER metrics (#3418) @yazanmashal03
Fix/autodiff/multi threads (#3793) @nathanielsimard
Add cautious_weight_decay to AdamW optimizer. (#3869) @crutcher
Fix evaluator dataloader device (#3893) @laggui

Backends

Migrate to new cubecl multi tensor handle changes (#3136) @wingertge
More memory control with scoped static memory management (#3410) @nathanielsimard
Feat/fusion quant (#3454) @nathanielsimard
Expose client utilities (#3559) @allenqm
New CPU backend based on MLIR (#3411) @marcantoinem
feat: ndarray dynamic tensor types and int tensor cast (#3647) @wingertge
Implement optimized bool_select for primary backends (#3710) @TheDarkchip
Add backend level is_nan / is_inf implementations (#3809) @laggui
Feat/persistent memory (#3842) @nathanielsimard
feat: add backend implementations for Trunc op (#3860) @mooori

Bug Fixes

Fix ndarray interpolate coord precision at boundaries (#3481) @laggui
Fix ndarray conv2d groups channels (#3415) @laggui
Fix candle mask broadcasting (#3489) @laggui
Update cubecl: fix wgpu vec to scalar cast (#3496) @Cielbird
Fix/conv2d groups backward (#3521) @laggui
Fix/conv3d backward groups (#3533) @laggui
[Fix] Add some missing handling for flex32 (#3551) @wingertge
Fix backward scatter dim (#3555) @laggui
fix: Use correct datatype when filling boolean tensors (#3593) @wingertge
fix: Ensure output layout is the same for non-inplace SIMD ops in ndarray (#3604) @wingertge
Fix scalar binop not contiguous (#3636) @laggui
Fix dtype dispatch in cubecl module ops (#3658) @laggui
Fix wgpu bool and/or (#3664) @laggui
Fix tch bool ones and rand int (#3684) @laggui
fix: Select assign + bool cast (#3730) @wingertge
Fix register_float_tensor to use the correct dtype (#3774) @A2va
Fix: autotune errors with fusion (added fallback) (#3778) @nathanielsimard
Fix mask_where broadcasted line size (#3823) @laggui
Fix adaptive avg pool2d backward line size (#3840) @laggui
Fix line size regression bug (#3850) @nathanielsimard
Correctly set cubecl::random::seed(seed) (#3878) @laggui
Fix indexing for permuted tensors with cumulative ops (#3891) @wingertge
Fix quantized reshape and into_contiguous (#3903) @wingertge
Fix fusion matmul inputs (#3905) @laggui
Fix powf vectorization on WGPU (#3916) @nathanielsimard

Documentation & Examples

[Docs] Add python prerequisite disclaimer for HuggingfaceDatasetLoader (#3484) @laggui
Mnist example augmented data (#3534) @Cielbird
Improve DataLoaderBuilder docs. (#3482) @crutcher
Readme + Burn Book performance section (#3686) @nathanielsimard
Update README for improved ONNX import documentation (#3738) @antimora
Some updates to the book (#3906) @louisfd

Fixes

fix: link in examples (#3475) @domenicocinque
Fix webassembly description + fusion usage + missing device (#3474) @laggui
Fix dataset split docs (#3508) @laggui
docs: fix example (#3498) @domenicocinque
Fix tensor docs examples (#3525) @laggui
Fix MNIST example model (#3549) @Cielbird
Fix/conv2d docs display (#3586) @huy209vn
Fix KaTeX docs (#3787) @laggui
Fix typo in getting-started (#3868) @Charles23R

ONNX Support

Add ONNX IsNaN and IsInf ops (#3393) @Friedrich-S
Add support onnx bernoulli (#3394) @tye-singwa
fix onnx reshape op elem_type inference (#3395) @tye-singwa
Adding bitwise ONNX ops (#3120) @AshAnand34
Add ONNX Attention op (#3423) @Friedrich-S
Add support and tests for ONNX Abs operator (#3536) @antimora
Infer conv spatial dims from weight rank (#3538) @laggui
Debug log new name during ONNX renames (#3539) @torsteingrindvik
Proto conversion: Allow f16 tensors by casting via bytemuck from raw data (#3541) @torsteingrindvik
Fix onnx auto_pad and ceil_mode attrs handling (#3542) @laggui
Support int min/max types in clip_config (#3544) @antimora
Make onnx-ir parse error more informative. Handle more data type variants in TryFrom -> Argument (#3545) @torsteingrindvik
Add Identity node support and fix initializer handling (#3543) @antimora
Use try_cast_vec with fallback in proto conversion (#3546) @laggui
onnx-ir: Infer conv2d kernel shape from weight tensor (#3554) @torsteingrindvik
Add comprehensive Shape type support for ONNX operations (#3381) @antimora
Extend onnx reduce op support (#3497) @tye-singwa
Enhance ConstantOfShape to support static shape input (#3550) @torsteingrindvik
Don't panic on allowzero since reshape supports it (#3573) @torsteingrindvik
ONNX enhancements to support CLIP ViT-B-32 (#3560) @antimora
Use prettyplease to format burn-import output rust files (#3578) @n1ght-hunter
Fix ONNX import rank inference for nodes downstream of Shape-type constant conversion (#3564) @antimora
Support dynamic shape and tensor sizes in ONNX resize (#3563) @antimora
Refactor backend selection for onnx-tests (#3584) @antimora
Add broadcasting support for add, sub, mul, and div ops (#3589) @antimora
Fix ONNX Slice operation axes parameter handling (#3594) @antimora
ONNX model checking: Yolo11x (#3599) @antimora
CLIP ViT-B/32 text model ONNX verification & backend fixes (#3623) @antimora
clip-vit-b-32-vision model verifications and fixes (#3673) @antimora
Implemented MatMulInteger ONNX in burn-import and Uint8/int8 element types (#3672) @huy209vn
Fix ONNX import: Integer constants serialization and MatMulInteger broadcasting (#3696) @antimora
Add EyeLike ONNX operation support (#3731) @TheDarkchip
Support ONNX Squeeze with axes input and no axes (#3736) @antimora
Enhance ONNX PRelu config initialization with alpha and num_parameters (#3746) @antimora
Add support for negative indices in Gather shape ops (#3749) @antimora
Update ONNX dependency to stable version (#3772) @antimora
Add NonZero ONNX operation support (#3745) @TheDarkchip
Add static shape propagation and broadcasting support for ONNX IR operations (#3763) @antimora
trunc, fmod and Mod ONNX ops (#3767) @antimora
Add uint16 to onnx-ir (#3791) @TheGhostHuCodes
Add YOLO model family check with ONNX import and test (#3750) @antimora
ONNX albert model check and bug fix (#3810) @antimora
Add ModernBERT-base model check (#3814) @antimora
Add all-MiniLM-L6-v2 ONNX model check (#3813) @antimora
ONNX: support broadcasting for bool_and (#3829) @mooori
Lift constants for ReduceMax and ReduceMean nodes (#3827) @TheGhostHuCodes
Burn import refactor to node-based registry architecture (#3825) @antimora
ONNX: support broadcasting for bool_or, bool_xor (#3839) @mooori
Update ONNX model support version to Opset 16+ (#3870) @jc-cr
Handle empty tensor constants in ONNX import (#3904) @antimora

Enhancements

Add more operations support in fusion (#3552) @nathanielsimard
Perf/linear layout (#3587) @nathanielsimard
Perf/data transfer (#3695) @nathanielsimard
Perf: GPU to CPU Copy (#3708) @nathanielsimard
feat: Matmul quant (#3874 #3910) @wingertge
Fix/matmul/fusion (#3899) @nathanielsimard

Refactoring

Refactor burn-train (#3451) @Cielbird
[chore] Migrate to memory management API refactor (#3477) @wingertge
Update cubecl: matmul refactor (#3493) @louisfd
Refactor/quant (#3500) @nathanielsimard
chore: Update cubecl with new changes to Item and layouts (#3626) @wingertge
Refactor/seed (#3641) @nathanielsimard
Reorganize activation layer sources into nn.activation module (#3627) @crutcher
Remove backend QuantizedEncoding type and unused candle/tch impl (#3645) @laggui
chore: Update cubecl with stacked view changes (#3687) @wingertge
chore: Update cubecl for split traits (#3700) @wingertge
Use bytes from cubecl (#3701) @nathanielsimard
Update cubecl runtime features (#3711) @wingertge
Use ScalarIr to represent scalars generically (#3706) @laggui
chore: Update cubecl to tile refactor PR (#3728) @wingertge
Refactor/broadcast layout (#3733) @wingertge
Add cubecl re-export, root Tensor, doc updates and Noam scheduler fix (#3742) @laggui
Move nn components to burn-nn (#3740) @laggui
Update cubecl (#3752) @wingertge
Chore update cubecl (#3764) @nathanielsimard
Chore: update cubecl + fix no-std (#3771) @laggui
Move optimizer components to burn-optim (#3773) @laggui
Feat/multi streams (#3775) @nathanielsimard
chore: Update cubecl for quant refactor and other changes (#3828) @wingertge
chore: Update for launch refactor (#3841) @wingertge
Refactor Shape manipulations (#3845) @laggui
refactor: Refactor matmul to use views for its inputs (#3846) @wingertge
Refactor/cubecl client (#3873) @nathanielsimard

Miscellaneous

chore: update dependencies (#3389) @reneleonhardt
Use member name as filter for wgpu tests (#3405) @laggui
Fix fusion no default feat (#3408) @laggui
Bump MSRV from 1.85 to latest stable 1.87 (#3424) @Friedrich-S
Add benchmarks.toml (#3430 #3457) @syl20bnr
Test benchmark execution on an Nvidia A100 (#3435 #3446) @syl20bnr
Burn-collective base (#3288) @Cielbird
ci: split tests on GitHub runners and on GPU runners (#3382) @syl20bnr
ci: bench on multiple machines (#3455) @syl20bnr
ci: fix wgpu-info (#3466) @syl20bnr
HuggingfaceDatasetLoader automatically check for pip (#3479) @Puranjay-del-Mishra
Refactor/collective (#3450) @nathanielsimard
cfg-mask ddp constructor (#3488) @crutcher
Update MSRV to 1.88 (#3492) @laggui
Fix various warnings reported by run-checks (#3512) @crutcher
Burn-vision transforms (#3527) @Cielbird
Add feature flag to bytemuck due to usage of API extern_crate_alloc (#3556) @torsteingrindvik
Fix shape type annotation in test (#3576) @laggui
Refactor burn-collective (#3572) @Cielbird
fix: Fix bug with scalar tail in morphology op (#3588) @wingertge
apply clippy fixes to burn-ndarray (#3618) @torsteingrindvik
Derive clone for Record Items (#3601) @amfaber
Add From implementations for ActivationConfig and cleanup tests (#3631) @crutcher
Fix new stable clippy lints (#3643) @janhohenheim
Fix stable clippy lints (#3644) @janhohenheim
Fix obvious problems (#3646) @nathanielsimard
Limit cubecl cpu target (#3656) @laggui
Bump cubecl to use wgpu 26 (#3657) @janhohenheim
add some missing default-features = false (#3675) @dcrewi
Fix no-std support for burn-no-std-tests and warning clean up (#3671) @antimora
Strengthen Doc Lints (#3691) @crutcher
From impls for Activation (#3692) @crutcher
Remove DimSwappedActivation (#3693) @crutcher
Shape: into_iter(), into_ranges(), to_vec(), slice() (#3694) @crutcher
Add burn-store crate for model storage with safetensors support (#3666) @antimora
Add #[allow(clippy::too_many_arguments)] to config constructor (#3737) @crutcher
Remove empty indices tests (#3747) @laggui
Fix various clippy lints (#3766) @wingertge
chore: remove redundant words (#3770) @juejinyuxitu
Fix segfaults from fusion panics with simple workaround (#3777) @wingertge
Remove vulkan/mesa no-std CI setup (#3781) @laggui
ci: add dispatch trigger publish workflow and bump xtask to 2.1.10 (#3788) @syl20bnr
Slice: Copy, full(), default() (#3796) @crutcher
Fix tests with hardcoded types (#3805) @wingertge
Add PytorchStore for optimized model loading and in-house pickle reader (#3741) @antimora
Fix ndarray compilation when cubecl-common enables rayon but ndarray doesn't (#3848) @wingertge
PyTorch reader: Add F16, BF16, and unsigned integer support (#3849) @antimora
Fix minor typo in POEM.md (#3851) @jc-cr
BurnpackStore (#3792) @antimora
Expose de/serialize numericentry (#3890) @Charles23R
Bump tch to 0.22.0 (#3892) @laggui
update cubecl (#3896) @louisfd
Disable no-std safetensorsstore (#3902) @antimora

View release on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Releases

View all →

Releases per month

Cadence 0.1 / wk

Last release 80d

Tracked 7

Security

Full profile →

Security score 6.5/10

OpenSSF —

Open CVEs 0

Active maintainer

Community

GitHub stars 15,581

Forks 967

Contributors 90d 25

Open issues 296

Open PRs 30

Stars/wk velocity 0.0

HN peak 6

About

Stars

15,581

Forks

967

Languages

Rust Python C++

View on GitHub Homepage Documentation

Community & Support

Discord

Similar tools

tensorzero

Burn

Netron

About

Stars

15,581

Forks

967

Languages

Rust Python C++

View on GitHub Homepage Documentation

Community & Support

Discord

Similar tools

tensorzero

Burn

Netron

burn

Features

Recent releases

Summary

Changelog

burn-dataset cache directory

Interface Changes

Module & Tensor

Datasets & Training

Backends

Bug Fixes

Documentation & Examples

Fixes

Enhancements

Refactoring

Miscellaneous

Bug Fixes & Improvement

Summary

Changelog

Training

Interface Changes

Module & Tensor

Datasets & Training

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous

Bug Fixes & Improvements

Summary

Changelog

Learning Strategy

Learner Training Result

Interface Changes

Config

BatchNorm

Backend::seed

Tensor

Module & Tensor

Datasets & Training

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous

About

Community & Support

Similar tools

`burn-dataset` cache directory

`Config`

`BatchNorm`

`Backend::seed`

`Tensor`