Skip to content

Release history

datachain releases

All releases

45 shown

No immediate action
0.57.0 New feature

Zarr support

No immediate action
0.56.1 New feature

UUID field in datasets

No immediate action
0.56.0 New feature

Public bucket auto‑detection

No immediate action
0.55.2 Breaking risk

Rename datachain_worker → compute

No immediate action
0.55.1 Bug fix

Deterministic read_storage hash

0.55.0 New feature
Notable features
  • Added similarity search wrapper to DataChain method
Full changelog

What's Changed

  • Added similarity search wrapper DataChain method by @ilongin in https://github.com/datachain-ai/datachain/pull/1756
  • fix: scope created dataset version returns by @shcheklein in https://github.com/datachain-ai/datachain/pull/1762

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.5...0.55.0

0.54.5 Bug fix

Fixed out‑of‑memory errors when saving large files by streaming File.save().

Full changelog

What's Changed

  • Update logout command to revoke token server-side by @amritghimire in https://github.com/datachain-ai/datachain/pull/1733
  • robot 1st pass by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1749
  • improve docs for filter, add tests by @shcheklein in https://github.com/datachain-ai/datachain/pull/1753
  • perf(file): stream File.save() to avoid OOM on large files by @shcheklein in https://github.com/datachain-ai/datachain/pull/1754

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.4...0.54.5

0.54.4 Breaking risk
Breaking changes
  • Re‑introduction of the removed `get_last_checkpoint` function
Full changelog

What's Changed

  • Revert "Revert "Removing get_last_checkpoint"" by @ilongin in https://github.com/datachain-ai/datachain/pull/1748

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.3...0.54.4

0.54.3 Breaking risk

Reverted removal of the get_last_checkpoint function.

Full changelog

What's Changed

  • Revert "Removing get_last_checkpoint" by @ilongin in https://github.com/datachain-ai/datachain/pull/1747

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.2...0.54.3

0.54.2 Breaking risk
Breaking changes
  • Removed `get_last_checkpoint` function
Notable features
  • Added datachain bucket status command to the CLI
Full changelog

What's Changed

  • feat(cli): add datachain bucket status command by @amritghimire in https://github.com/datachain-ai/datachain/pull/1717
  • Removing get_last_checkpoint by @ilongin in https://github.com/datachain-ai/datachain/pull/1744

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.1...0.54.2

0.54.1 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

  • docs from 2nd brain by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1730
  • data memory readme by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1736
  • Readme data memory followups by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1739
  • emit subselect if it is neeeded by @shcheklein in https://github.com/datachain-ai/datachain/pull/1732

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.0...0.54.1

0.54.0 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

  • Skip creating identical bucket listing dataset if nothing is changed by @ilongin in https://github.com/datachain-ai/datachain/pull/1722

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.53.0...0.54.0

0.53.0 Bug fix

Fixed UDFs that produce no output.

Full changelog

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1727
  • Fix for zero output udf by @ilongin in https://github.com/datachain-ai/datachain/pull/1728
  • Removing transient dependencies from checkpoint hash calculation by @ilongin in https://github.com/datachain-ai/datachain/pull/1653

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.52.0...0.53.0

0.52.0 Bug fix
Notable features
  • Added cast support
Full changelog

What's Changed

  • fix(gcs): avoid slow credential resolution by @0x2b3bfa0 in https://github.com/datachain-ai/datachain/pull/1709
  • add cast, fix order by with expressions by @shcheklein in https://github.com/datachain-ai/datachain/pull/1702

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.51.1...0.52.0

0.51.1 Breaking risk
Breaking changes
  • Removed legacy compatibility code.
Full changelog

What's Changed

  • Skills by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1650
  • Skill readme by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1695
  • Making enlist_source using ephemeral chain and add check for job to pre-exist if in studio by @ilongin in https://github.com/datachain-ai/datachain/pull/1718
  • Remove legacy compatibility code by @shcheklein in https://github.com/datachain-ai/datachain/pull/1723

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.51.0...0.51.1

0.51.0 Breaking risk
Breaking changes
  • Removed `DataChain.from_storage()`, `from_dataset()`, `from_json()`, `from_values()`, `from_pandas()`, `from_hf()`, `from_csv()`, `from_parquet()`, `from_records()`, `datasets()`, and `listings()`; use module-level equivalents (`read_*` functions).
  • Removed instance methods `DataChain.batch_map()` and `DataChain.collect()`; use `DataChain.agg()` and `to_iter()` respectively.
  • Removed UDF class `BatchMapper`; replace with `Aggregator`.
Full changelog

What's Changed

  • build(deps-dev): bump mypy from 1.19.1 to 1.20.0 by @dependabot[bot] in https://github.com/datachain-ai/datachain/pull/1714
  • perf(metastore): eliminate O(N) merge_versions calls in list_datasets by @amritghimire in https://github.com/datachain-ai/datachain/pull/1721
  • feat!: remove deprecated APIs by @amritghimire in https://github.com/datachain-ai/datachain/pull/1720

Breaking Changes

The deprecated functions and classes are removed.

DataChain class methods removed (use module-level functions)

| Removed | Replacement |
|---------|-------------|
| DataChain.from_storage() | read_storage() |
| DataChain.from_dataset() | read_dataset() |
| DataChain.from_json() | read_json() |
| DataChain.from_values() | read_values() |
| DataChain.from_pandas() | read_pandas() |
| DataChain.from_hf() | read_hf() |
| DataChain.from_csv() | read_csv() |
| DataChain.from_parquet() | read_parquet() |
| DataChain.from_records() | read_records() |
| DataChain.datasets() | module-level datasets() |
| DataChain.listings() | module-level listings() |

DataChain instance methods removed

| Removed | Replacement |
|---------|-------------|
| DataChain.batch_map() | DataChain.agg() |
| DataChain.collect() | DataChain.to_iter() |

UDF classes removed

| Removed | Replacement |
|---------|-------------|
| BatchMapper | Aggregator |

File methods removed

| Removed | Replacement |
|---------|-------------|
| File.get_uri() | file.get_fs_path() |
| resolve(file) function | file.resolve() |

Environment variables removed

| Removed | Replacement |
|---------|-------------|
| DVC_STUDIO_* | DATACHAIN_STUDIO_* |

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.50.2...0.51.0

0.50.2 New feature
Notable features
  • Preserves the sys__id field during partial table copy operations
Full changelog

What's Changed

  • Preserve sys__id on copy partial table by @ilongin in https://github.com/datachain-ai/datachain/pull/1682

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.50.1...0.50.2

0.50.1 New feature
Notable features
  • Added `remove_dataset_versions` method to catalog for dataset version management
Full changelog

What's Changed

  • UDF checkpoints for aggregator by @ilongin in https://github.com/datachain-ai/datachain/pull/1593
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1711
  • refactor(catalog): add remove_dataset_versions by @amritghimire in https://github.com/datachain-ai/datachain/pull/1704

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.50.0...0.50.1

0.50.0 Bug fix

Fixed type inference in column expression and improved group_by handling of inline function expressions.

Full changelog

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1694
  • Removing exact hash assertion to hash tests by @ilongin in https://github.com/datachain-ai/datachain/pull/1697
  • Using listing dataset uuid for hashing instead of listing URI by @ilongin in https://github.com/datachain-ai/datachain/pull/1656
  • fix: Type inference in column expression by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1693
  • fix(group_by): support inline func expressions in partition_by without label by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1677
  • fix(catalog): use for loop in remove_dataset to handle skipped versions by @amritghimire in https://github.com/datachain-ai/datachain/pull/1703

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.49.1...0.50.0

0.49.1 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

  • Warehouse wake up cleanup by @dreadatour in https://github.com/datachain-ai/datachain/pull/1586

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.49.0...0.49.1

0.49.0 Bug fix

get-dataset metastore now excludes preview and version entries by default.

Full changelog

What's Changed

  • get-dataset metastore: exclude preview and versions by default by @shcheklein in https://github.com/datachain-ai/datachain/pull/1661

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.4...0.49.0

0.48.4 Breaking risk
Breaking changes
  • Removed the torch pin requirement following torchcodec 0.11 release
Full changelog

What's Changed

  • remove torch pin since torchcodec 0.11 got released by @shcheklein in https://github.com/datachain-ai/datachain/pull/1685
  • Fix yolo tests by @ilongin in https://github.com/datachain-ai/datachain/pull/1686
  • Refactor hash tests to not use exact hash values by @ilongin in https://github.com/datachain-ai/datachain/pull/1684
  • Reverse order remove dataset version by @ilongin in https://github.com/datachain-ai/datachain/pull/1675

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.3...0.48.4

0.48.3 Bug fix

Fixed handling of None attributes in the DatasetInfo validator.

Full changelog

What's Changed

  • fix(dataset_info): handle None attrs in DatasetInfo validator by @amritghimire in https://github.com/datachain-ai/datachain/pull/1680

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.2...0.48.3

0.48.2 New feature
Notable features
  • Support version in dataset name for read_dataset
  • Expose query_script via Python API and CLI
Full changelog

What's Changed

  • Expose query_script via Python API and CLI by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1672
  • fix(cli): use new output format in datachain show command by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1668
  • feat: support version in dataset name for read_dataset by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1670
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1673
  • Cleanup temp datasets by @ilongin in https://github.com/datachain-ai/datachain/pull/1631

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.1...0.48.2

0.48.1 Bug fix

Fixed file uploads above 100MB silently dropping.

Full changelog

What's Changed

  • fix(cli): reset reconnect counter after healthy WebSocket session by @amritghimire in https://github.com/datachain-ai/datachain/pull/1652
  • re-do checkpoints docs by @ilongin in https://github.com/datachain-ai/datachain/pull/1659
  • fix(to_storage): better Ctrl-C handling by @shcheklein in https://github.com/datachain-ai/datachain/pull/1658
  • fix(delta): apply listing steps to materialize starting step by @shcheklein in https://github.com/datachain-ai/datachain/pull/1657
  • cleanup and fix examples by @shcheklein in https://github.com/datachain-ai/datachain/pull/1662
  • Improving checkpoints docs by @ilongin in https://github.com/datachain-ai/datachain/pull/1663

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.0...0.48.1

0.48.0 Bug fix
Notable features
  • Dataset version UUID used for hash calculation in QueryStep
Full changelog

What's Changed

  • Using dataset version uuid for hash calculation in QueryStep by @ilongin in https://github.com/datachain-ai/datachain/pull/1645
  • fix(save): make isolated and atomic by @shcheklein in https://github.com/datachain-ai/datachain/pull/1603
  • fix(delta): transform query properly to make unsafe work by @shcheklein in https://github.com/datachain-ai/datachain/pull/1644
  • Fix File path handling across storage backends by @shcheklein in https://github.com/datachain-ai/datachain/pull/1604

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.47.2...0.48.0

0.47.2 New feature
Notable features
  • Garbage collection (gc) now includes both STALE and REMOVING dataset versions
Full changelog

What's Changed

  • docs: fix DataFrame capitalization by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1620
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1627
  • Fixing output when using progress bar in REPL by @ilongin in https://github.com/datachain-ai/datachain/pull/1628
  • Fixing and refactoring parse_dataset_uri() by @ilongin in https://github.com/datachain-ai/datachain/pull/1352
  • Fix stale reference of deleted dataset table by @ilongin in https://github.com/datachain-ai/datachain/pull/1468
  • Fixing checkpoints docs example by @ilongin in https://github.com/datachain-ai/datachain/pull/1637
  • feat(gc): include STALE and REMOVING dataset versions in GC cleanup by @amritghimire in https://github.com/datachain-ai/datachain/pull/1621

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.47.1...0.47.2

0.47.1 Breaking risk
Breaking changes
  • Removed 'conn' and 'cursor' parameters from all metastore method signatures.
Full changelog

What's Changed

  • Remove obsolete 'conn' and 'cursor' params from metastore methods by @dreadatour in https://github.com/datachain-ai/datachain/pull/1623

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.47.0...0.47.1

0.47.0 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

  • UDF Checkpoints cleanup by @ilongin in https://github.com/datachain-ai/datachain/pull/1590

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.5...0.47.0

0.46.5 Breaking risk
Breaking changes
  • Removed 'uri' parameter from Metastore constructor
Full changelog

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1610
  • docs: fix PyTorch and TensorFlow capitalization by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1617
  • docs: fix PyTorch spelling in examples by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1618
  • docs: clean up wording in serialization section by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1615
  • Fix for use case when generator skips input rows in checkpoints by @ilongin in https://github.com/datachain-ai/datachain/pull/1609
  • Remove outdated 'uri' param from Metastore constructor by @dreadatour in https://github.com/datachain-ai/datachain/pull/1622

New Contributors

  • @haosenwang1018 made their first contribution in https://github.com/datachain-ai/datachain/pull/1617

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.4...0.46.5

0.46.4 Bug fix

Fixed creation of checkpoints and jobs when checkpoints are disabled.

Full changelog

What's Changed

  • Fix to not create checkpoints and job if checkpoints are disabled by @ilongin in https://github.com/datachain-ai/datachain/pull/1601

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.3...0.46.4

0.46.3 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

Technical release to bump dependencies.

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.2...0.46.3

0.46.2 New feature
Security fixes
  • dep: GHSA-cfh3-3jmp-rvhc — Pillow updated to 12.1.1
Notable features
  • New exit code returned when a query is aborted
Full changelog

What's Changed

  • Fixing checkpoints when run_group_id is not defined for job by @ilongin in https://github.com/datachain-ai/datachain/pull/1597
  • Refactor create_job method by @ilongin in https://github.com/datachain-ai/datachain/pull/1602
  • Update Pillow to 12.1.1 due to GHSA-cfh3-3jmp-rvhc vulnerability by @dreadatour in https://github.com/datachain-ai/datachain/pull/1584
  • Add new exit code for aborted queries by @dreadatour in https://github.com/datachain-ai/datachain/pull/1587
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1605

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.1...0.46.2

0.46.1 Bug fix

Fixed out‑of‑bounds indexing in get_element to treat negative and out‑of‑range indices uniformly.

Full changelog

What's Changed

  • fix file docs, add audiofile by @shcheklein in https://github.com/datachain-ai/datachain/pull/1589
  • fix(get_element): unify oob and negative indexes by @shcheklein in https://github.com/datachain-ai/datachain/pull/1588
  • Make pull and read_dataset from Studio atomic by @shcheklein in https://github.com/datachain-ai/datachain/pull/1573

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.0...0.46.1

0.46.0 Bug fix

Fixed error messages when pickling/unpickling UDFs.

Full changelog

What's Changed

  • fix(udf): fix errors pickling / unpicking to show proper messages by @shcheklein in https://github.com/datachain-ai/datachain/pull/1579
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1580
  • UDF Checkpoints by @ilongin in https://github.com/datachain-ai/datachain/pull/1422

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.45...0.46.0

0.45 New feature
Notable features
  • Added --no-follow option to job run command
Full changelog

What's Changed

  • job run: add --no-follow and fix behavior when websocket closes early by @amritghimire in https://github.com/datachain-ai/datachain/pull/1577

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.9...0.45

0.44.9 New feature
Notable features
  • Added InsertBuffer with flush_interval configuration and usage in sqlite.py
Full changelog

What's Changed

  • Added InsertBuffer with flush_interval and using it in sqlite.py by @ilongin in https://github.com/datachain-ai/datachain/pull/1568

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.8...0.44.9

0.44.8 New feature
Notable features
  • Added datachain job run CLI command to rerun jobs using checkpoint snapshots
Full changelog

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1572
  • Ability to rerun a job with checkpoints in Studio using datachain job run CLI command by @ilongin in https://github.com/datachain-ai/datachain/pull/1554

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.7...0.44.8

0.44.7 Bug fix

Fixed subtract operation to match documentation.

Full changelog

What's Changed

  • improve --env handling in CLI job run by @shcheklein in https://github.com/datachain-ai/datachain/pull/1567
  • fix(subtract): make it work according docs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1569

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.6...0.44.7

0.44.6 Breaking risk
Breaking changes
  • Removed torch usage restriction
Full changelog

What's Changed

  • drop restriction on torch, since new torchcodec was released by @shcheklein in https://github.com/datachain-ai/datachain/pull/1571

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.5...0.44.6

0.44.5 Bug fix

Improved error messages for type failures in user-defined functions.

Full changelog

What's Changed

  • Added logs when updating dataset version stats and preview by @ilongin in https://github.com/datachain-ai/datachain/pull/1545
  • cleanup read_records API and docs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1556
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1562
  • improve error message on type failures in UDFs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1555
  • Show logs from archives in job when logs are processed by @amritghimire in https://github.com/datachain-ai/datachain/pull/1559
  • Fixing pandas test by @ilongin in https://github.com/datachain-ai/datachain/pull/1565

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.4...0.44.5

0.44.4 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

  • Refactor batch sizes by @dreadatour in https://github.com/datachain-ai/datachain/pull/1552

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.3...0.44.4

0.44.3 Breaking risk
Breaking changes
  • Removed config/flag key `consistent_read`
Full changelog

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1551
  • Remove unused code by @dreadatour in https://github.com/datachain-ai/datachain/pull/1550
  • Removing consistent_read by @ilongin in https://github.com/datachain-ai/datachain/pull/1547

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.2...0.44.3

0.44.2 Bug fix
Notable features
  • Added simple local db migration
Full changelog

What's Changed

  • Added simple local db migration by @ilongin in https://github.com/datachain-ai/datachain/pull/1537
  • Fix for local db migrations by @ilongin in https://github.com/datachain-ai/datachain/pull/1549

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.1...0.44.2

0.44.1 New feature
Notable features
  • Added `--consistent-read` flag to enforce consistent reads
Full changelog

What's Changed

  • dvc small fixes by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1538
  • Fix CI warnings by @ilongin in https://github.com/datachain-ai/datachain/pull/1541
  • cleanup warning, bump libs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1544
  • Added flag for consistent read by @ilongin in https://github.com/datachain-ai/datachain/pull/1539

Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.0...0.44.1

Beta — feedback welcome: [email protected]