Release history
datachain releases
All releases
45 shown
- Added similarity search wrapper to DataChain method
Full changelog
What's Changed
- Added similarity search wrapper DataChain method by @ilongin in https://github.com/datachain-ai/datachain/pull/1756
- fix: scope created dataset version returns by @shcheklein in https://github.com/datachain-ai/datachain/pull/1762
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.5...0.55.0
Fixed out‑of‑memory errors when saving large files by streaming File.save().
Full changelog
What's Changed
- Update logout command to revoke token server-side by @amritghimire in https://github.com/datachain-ai/datachain/pull/1733
- robot 1st pass by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1749
- improve docs for filter, add tests by @shcheklein in https://github.com/datachain-ai/datachain/pull/1753
- perf(file): stream File.save() to avoid OOM on large files by @shcheklein in https://github.com/datachain-ai/datachain/pull/1754
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.4...0.54.5
- Re‑introduction of the removed `get_last_checkpoint` function
Full changelog
What's Changed
- Revert "Revert "Removing
get_last_checkpoint"" by @ilongin in https://github.com/datachain-ai/datachain/pull/1748
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.3...0.54.4
Reverted removal of the get_last_checkpoint function.
Full changelog
What's Changed
- Revert "Removing
get_last_checkpoint" by @ilongin in https://github.com/datachain-ai/datachain/pull/1747
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.2...0.54.3
- Removed `get_last_checkpoint` function
- Added datachain bucket status command to the CLI
Full changelog
What's Changed
- feat(cli): add datachain bucket status command by @amritghimire in https://github.com/datachain-ai/datachain/pull/1717
- Removing
get_last_checkpointby @ilongin in https://github.com/datachain-ai/datachain/pull/1744
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.1...0.54.2
Minor fixes and improvements.
Full changelog
What's Changed
- docs from 2nd brain by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1730
- data memory readme by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1736
- Readme data memory followups by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1739
- emit subselect if it is neeeded by @shcheklein in https://github.com/datachain-ai/datachain/pull/1732
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.54.0...0.54.1
Minor fixes and improvements.
Full changelog
What's Changed
- Skip creating identical bucket listing dataset if nothing is changed by @ilongin in https://github.com/datachain-ai/datachain/pull/1722
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.53.0...0.54.0
Fixed UDFs that produce no output.
Full changelog
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1727
- Fix for zero output udf by @ilongin in https://github.com/datachain-ai/datachain/pull/1728
- Removing transient dependencies from checkpoint hash calculation by @ilongin in https://github.com/datachain-ai/datachain/pull/1653
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.52.0...0.53.0
- Added cast support
Full changelog
What's Changed
- fix(gcs): avoid slow credential resolution by @0x2b3bfa0 in https://github.com/datachain-ai/datachain/pull/1709
- add cast, fix order by with expressions by @shcheklein in https://github.com/datachain-ai/datachain/pull/1702
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.51.1...0.52.0
- Removed legacy compatibility code.
Full changelog
What's Changed
- Skills by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1650
- Skill readme by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1695
- Making
enlist_sourceusing ephemeral chain and add check for job to pre-exist if in studio by @ilongin in https://github.com/datachain-ai/datachain/pull/1718 - Remove legacy compatibility code by @shcheklein in https://github.com/datachain-ai/datachain/pull/1723
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.51.0...0.51.1
- Removed `DataChain.from_storage()`, `from_dataset()`, `from_json()`, `from_values()`, `from_pandas()`, `from_hf()`, `from_csv()`, `from_parquet()`, `from_records()`, `datasets()`, and `listings()`; use module-level equivalents (`read_*` functions).
- Removed instance methods `DataChain.batch_map()` and `DataChain.collect()`; use `DataChain.agg()` and `to_iter()` respectively.
- Removed UDF class `BatchMapper`; replace with `Aggregator`.
Full changelog
What's Changed
- build(deps-dev): bump mypy from 1.19.1 to 1.20.0 by @dependabot[bot] in https://github.com/datachain-ai/datachain/pull/1714
- perf(metastore): eliminate O(N) merge_versions calls in list_datasets by @amritghimire in https://github.com/datachain-ai/datachain/pull/1721
- feat!: remove deprecated APIs by @amritghimire in https://github.com/datachain-ai/datachain/pull/1720
Breaking Changes
The deprecated functions and classes are removed.
DataChain class methods removed (use module-level functions)
| Removed | Replacement |
|---------|-------------|
| DataChain.from_storage() | read_storage() |
| DataChain.from_dataset() | read_dataset() |
| DataChain.from_json() | read_json() |
| DataChain.from_values() | read_values() |
| DataChain.from_pandas() | read_pandas() |
| DataChain.from_hf() | read_hf() |
| DataChain.from_csv() | read_csv() |
| DataChain.from_parquet() | read_parquet() |
| DataChain.from_records() | read_records() |
| DataChain.datasets() | module-level datasets() |
| DataChain.listings() | module-level listings() |
DataChain instance methods removed
| Removed | Replacement |
|---------|-------------|
| DataChain.batch_map() | DataChain.agg() |
| DataChain.collect() | DataChain.to_iter() |
UDF classes removed
| Removed | Replacement |
|---------|-------------|
| BatchMapper | Aggregator |
File methods removed
| Removed | Replacement |
|---------|-------------|
| File.get_uri() | file.get_fs_path() |
| resolve(file) function | file.resolve() |
Environment variables removed
| Removed | Replacement |
|---------|-------------|
| DVC_STUDIO_* | DATACHAIN_STUDIO_* |
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.50.2...0.51.0
- Preserves the sys__id field during partial table copy operations
Full changelog
What's Changed
- Preserve sys__id on copy partial table by @ilongin in https://github.com/datachain-ai/datachain/pull/1682
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.50.1...0.50.2
- Added `remove_dataset_versions` method to catalog for dataset version management
Full changelog
What's Changed
- UDF checkpoints for aggregator by @ilongin in https://github.com/datachain-ai/datachain/pull/1593
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1711
- refactor(catalog): add remove_dataset_versions by @amritghimire in https://github.com/datachain-ai/datachain/pull/1704
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.50.0...0.50.1
Fixed type inference in column expression and improved group_by handling of inline function expressions.
Full changelog
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1694
- Removing exact hash assertion to hash tests by @ilongin in https://github.com/datachain-ai/datachain/pull/1697
- Using listing dataset
uuidfor hashing instead of listing URI by @ilongin in https://github.com/datachain-ai/datachain/pull/1656 - fix: Type inference in column expression by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1693
- fix(group_by): support inline func expressions in partition_by without label by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1677
- fix(catalog): use for loop in remove_dataset to handle skipped versions by @amritghimire in https://github.com/datachain-ai/datachain/pull/1703
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.49.1...0.50.0
Minor fixes and improvements.
Full changelog
What's Changed
- Warehouse wake up cleanup by @dreadatour in https://github.com/datachain-ai/datachain/pull/1586
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.49.0...0.49.1
get-dataset metastore now excludes preview and version entries by default.
Full changelog
What's Changed
- get-dataset metastore: exclude preview and versions by default by @shcheklein in https://github.com/datachain-ai/datachain/pull/1661
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.4...0.49.0
- Removed the torch pin requirement following torchcodec 0.11 release
Full changelog
What's Changed
- remove torch pin since torchcodec 0.11 got released by @shcheklein in https://github.com/datachain-ai/datachain/pull/1685
- Fix yolo tests by @ilongin in https://github.com/datachain-ai/datachain/pull/1686
- Refactor hash tests to not use exact hash values by @ilongin in https://github.com/datachain-ai/datachain/pull/1684
- Reverse order remove dataset version by @ilongin in https://github.com/datachain-ai/datachain/pull/1675
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.3...0.48.4
Fixed handling of None attributes in the DatasetInfo validator.
Full changelog
What's Changed
- fix(dataset_info): handle None attrs in DatasetInfo validator by @amritghimire in https://github.com/datachain-ai/datachain/pull/1680
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.2...0.48.3
- Support version in dataset name for read_dataset
- Expose query_script via Python API and CLI
Full changelog
What's Changed
- Expose query_script via Python API and CLI by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1672
- fix(cli): use new output format in
datachain showcommand by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1668 - feat: support version in dataset name for read_dataset by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1670
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1673
- Cleanup temp datasets by @ilongin in https://github.com/datachain-ai/datachain/pull/1631
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.1...0.48.2
Fixed file uploads above 100MB silently dropping.
Full changelog
What's Changed
- fix(cli): reset reconnect counter after healthy WebSocket session by @amritghimire in https://github.com/datachain-ai/datachain/pull/1652
- re-do checkpoints docs by @ilongin in https://github.com/datachain-ai/datachain/pull/1659
- fix(to_storage): better Ctrl-C handling by @shcheklein in https://github.com/datachain-ai/datachain/pull/1658
- fix(delta): apply listing steps to materialize starting step by @shcheklein in https://github.com/datachain-ai/datachain/pull/1657
- cleanup and fix examples by @shcheklein in https://github.com/datachain-ai/datachain/pull/1662
- Improving checkpoints docs by @ilongin in https://github.com/datachain-ai/datachain/pull/1663
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.48.0...0.48.1
- Dataset version UUID used for hash calculation in QueryStep
Full changelog
What's Changed
- Using dataset version uuid for hash calculation in QueryStep by @ilongin in https://github.com/datachain-ai/datachain/pull/1645
- fix(save): make isolated and atomic by @shcheklein in https://github.com/datachain-ai/datachain/pull/1603
- fix(delta): transform query properly to make unsafe work by @shcheklein in https://github.com/datachain-ai/datachain/pull/1644
- Fix File path handling across storage backends by @shcheklein in https://github.com/datachain-ai/datachain/pull/1604
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.47.2...0.48.0
- Garbage collection (gc) now includes both STALE and REMOVING dataset versions
Full changelog
What's Changed
- docs: fix DataFrame capitalization by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1620
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1627
- Fixing output when using progress bar in REPL by @ilongin in https://github.com/datachain-ai/datachain/pull/1628
- Fixing and refactoring
parse_dataset_uri()by @ilongin in https://github.com/datachain-ai/datachain/pull/1352 - Fix stale reference of deleted dataset table by @ilongin in https://github.com/datachain-ai/datachain/pull/1468
- Fixing checkpoints docs example by @ilongin in https://github.com/datachain-ai/datachain/pull/1637
- feat(gc): include STALE and REMOVING dataset versions in GC cleanup by @amritghimire in https://github.com/datachain-ai/datachain/pull/1621
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.47.1...0.47.2
- Removed 'conn' and 'cursor' parameters from all metastore method signatures.
Full changelog
What's Changed
- Remove obsolete 'conn' and 'cursor' params from metastore methods by @dreadatour in https://github.com/datachain-ai/datachain/pull/1623
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.47.0...0.47.1
Minor fixes and improvements.
Full changelog
What's Changed
- UDF Checkpoints cleanup by @ilongin in https://github.com/datachain-ai/datachain/pull/1590
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.5...0.47.0
- Removed 'uri' parameter from Metastore constructor
Full changelog
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1610
- docs: fix PyTorch and TensorFlow capitalization by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1617
- docs: fix PyTorch spelling in examples by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1618
- docs: clean up wording in serialization section by @haosenwang1018 in https://github.com/datachain-ai/datachain/pull/1615
- Fix for use case when generator skips input rows in checkpoints by @ilongin in https://github.com/datachain-ai/datachain/pull/1609
- Remove outdated 'uri' param from Metastore constructor by @dreadatour in https://github.com/datachain-ai/datachain/pull/1622
New Contributors
- @haosenwang1018 made their first contribution in https://github.com/datachain-ai/datachain/pull/1617
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.4...0.46.5
Fixed creation of checkpoints and jobs when checkpoints are disabled.
Full changelog
What's Changed
- Fix to not create checkpoints and job if checkpoints are disabled by @ilongin in https://github.com/datachain-ai/datachain/pull/1601
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.3...0.46.4
Minor fixes and improvements.
Full changelog
What's Changed
Technical release to bump dependencies.
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.2...0.46.3
- dep: GHSA-cfh3-3jmp-rvhc — Pillow updated to 12.1.1
- New exit code returned when a query is aborted
Full changelog
What's Changed
- Fixing checkpoints when
run_group_idis not defined for job by @ilongin in https://github.com/datachain-ai/datachain/pull/1597 - Refactor
create_jobmethod by @ilongin in https://github.com/datachain-ai/datachain/pull/1602 - Update Pillow to 12.1.1 due to GHSA-cfh3-3jmp-rvhc vulnerability by @dreadatour in https://github.com/datachain-ai/datachain/pull/1584
- Add new exit code for aborted queries by @dreadatour in https://github.com/datachain-ai/datachain/pull/1587
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1605
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.1...0.46.2
Fixed out‑of‑bounds indexing in get_element to treat negative and out‑of‑range indices uniformly.
Full changelog
What's Changed
- fix file docs, add audiofile by @shcheklein in https://github.com/datachain-ai/datachain/pull/1589
- fix(get_element): unify oob and negative indexes by @shcheklein in https://github.com/datachain-ai/datachain/pull/1588
- Make
pullandread_datasetfrom Studio atomic by @shcheklein in https://github.com/datachain-ai/datachain/pull/1573
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.46.0...0.46.1
Fixed error messages when pickling/unpickling UDFs.
Full changelog
What's Changed
- fix(udf): fix errors pickling / unpicking to show proper messages by @shcheklein in https://github.com/datachain-ai/datachain/pull/1579
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1580
- UDF Checkpoints by @ilongin in https://github.com/datachain-ai/datachain/pull/1422
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.45...0.46.0
- Added --no-follow option to job run command
Full changelog
What's Changed
- job run: add --no-follow and fix behavior when websocket closes early by @amritghimire in https://github.com/datachain-ai/datachain/pull/1577
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.9...0.45
- Added InsertBuffer with flush_interval configuration and usage in sqlite.py
Full changelog
What's Changed
- Added
InsertBufferwith flush_interval and using it in sqlite.py by @ilongin in https://github.com/datachain-ai/datachain/pull/1568
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.8...0.44.9
- Added datachain job run CLI command to rerun jobs using checkpoint snapshots
Full changelog
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1572
- Ability to rerun a job with checkpoints in Studio using
datachain job runCLI command by @ilongin in https://github.com/datachain-ai/datachain/pull/1554
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.7...0.44.8
Fixed subtract operation to match documentation.
Full changelog
What's Changed
- improve --env handling in CLI job run by @shcheklein in https://github.com/datachain-ai/datachain/pull/1567
- fix(subtract): make it work according docs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1569
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.6...0.44.7
- Removed torch usage restriction
Full changelog
What's Changed
- drop restriction on torch, since new torchcodec was released by @shcheklein in https://github.com/datachain-ai/datachain/pull/1571
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.5...0.44.6
Improved error messages for type failures in user-defined functions.
Full changelog
What's Changed
- Added logs when updating dataset version stats and preview by @ilongin in https://github.com/datachain-ai/datachain/pull/1545
- cleanup read_records API and docs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1556
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1562
- improve error message on type failures in UDFs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1555
- Show logs from archives in job when logs are processed by @amritghimire in https://github.com/datachain-ai/datachain/pull/1559
- Fixing pandas test by @ilongin in https://github.com/datachain-ai/datachain/pull/1565
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.4...0.44.5
Minor fixes and improvements.
Full changelog
What's Changed
- Refactor batch sizes by @dreadatour in https://github.com/datachain-ai/datachain/pull/1552
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.3...0.44.4
- Removed config/flag key `consistent_read`
Full changelog
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in https://github.com/datachain-ai/datachain/pull/1551
- Remove unused code by @dreadatour in https://github.com/datachain-ai/datachain/pull/1550
- Removing
consistent_readby @ilongin in https://github.com/datachain-ai/datachain/pull/1547
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.2...0.44.3
- Added simple local db migration
Full changelog
What's Changed
- Added simple local db migration by @ilongin in https://github.com/datachain-ai/datachain/pull/1537
- Fix for local db migrations by @ilongin in https://github.com/datachain-ai/datachain/pull/1549
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.1...0.44.2
- Added `--consistent-read` flag to enforce consistent reads
Full changelog
What's Changed
- dvc small fixes by @dmpetrov in https://github.com/datachain-ai/datachain/pull/1538
- Fix CI warnings by @ilongin in https://github.com/datachain-ai/datachain/pull/1541
- cleanup warning, bump libs by @shcheklein in https://github.com/datachain-ai/datachain/pull/1544
- Added flag for consistent read by @ilongin in https://github.com/datachain-ai/datachain/pull/1539
Full Changelog: https://github.com/datachain-ai/datachain/compare/0.44.0...0.44.1