Skip to content

datasets

v4.8.5 Bugfix

This release fixes issues for SREs watching stability and regressions.

Published 1mo RAG & Retrieval
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai artificial-intelligence computer-vision dataset-hub datasets machine-learning
+9 more
huggingface llm natural-language-processing nlp numpy pandas pytorch speech tensorflow

Summary

AI summary

Fixed JSON decoding before DataFrame.to_json and related conversions.

Full changelog

Main bug fixes

  • fix: decode Json() values before calling DataFrame.to_json() (#8116) by @Brianzhengca in https://github.com/huggingface/datasets/pull/8122
  • Fix: decode JSON type before to_list or to_dict is called by @ItsTania in https://github.com/huggingface/datasets/pull/8137
  • Fix batching for table-formatted datasets by @bluehyena in https://github.com/huggingface/datasets/pull/8126
  • Fix iterable map resume state by @Brianzhengca in https://github.com/huggingface/datasets/pull/8147
  • don't embed remote files in download_and_prepare to parquet by @lhoestq in https://github.com/huggingface/datasets/pull/8150

Other improvements and bug fixes

  • Parse agent traces by @lhoestq in https://github.com/huggingface/datasets/pull/8113
  • 🔒 Pin GitHub Actions to commit SHAs by @paulinebm in https://github.com/huggingface/datasets/pull/8114
  • chore: bump doc-builder SHA for PR upload workflow by @rtrompier in https://github.com/huggingface/datasets/pull/8134
  • Remove print statement in JSON processing by @lhoestq in https://github.com/huggingface/datasets/pull/8136
  • Don't include files list DatasetInfo (and remove old stuff) by @lhoestq in https://github.com/huggingface/datasets/pull/8128
  • update ci uer by @lhoestq in https://github.com/huggingface/datasets/pull/8139
  • fix warning in ci by @lhoestq in https://github.com/huggingface/datasets/pull/8140
  • fix mask in embed_storage for remote files by @lhoestq in https://github.com/huggingface/datasets/pull/8151
  • fix original_files missing in ci json test by @lhoestq in https://github.com/huggingface/datasets/pull/8152
  • Fix null in embed storage by @lhoestq in https://github.com/huggingface/datasets/pull/8154
  • Fix base_path in integration tests by @lhoestq in https://github.com/huggingface/datasets/pull/8155

New Contributors

  • @paulinebm made their first contribution in https://github.com/huggingface/datasets/pull/8114
  • @Brianzhengca made their first contribution in https://github.com/huggingface/datasets/pull/8122
  • @bluehyena made their first contribution in https://github.com/huggingface/datasets/pull/8126
  • @rtrompier made their first contribution in https://github.com/huggingface/datasets/pull/8134
  • @ItsTania made their first contribution in https://github.com/huggingface/datasets/pull/8137

Full Changelog: https://github.com/huggingface/datasets/compare/4.8.4...4.8.5

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track datasets

Get notified when new releases ship.

Sign up free

About datasets

The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

All releases →

Related context

Beta — feedback welcome: [email protected]