datasets-dump

A tool for dumping datasets from the Hugging Face datasets library


Keywords
datasets
License
MIT
Install
pip install datasets-dump==0.1.1

Documentation

datasets-dump

Dump embedded datasets to audio folder or images folder.

Get the audio folder / image folder back from parquet files.

usage

Usage

datasets-dump someone/dataset ./dist

Python API:

def dump(
    dataset: Union[str, Dataset, DatasetDict],
    dist: str | Path,
    audio_column: Optional[str] = None,
    image_column: Optional[str] = None,
    metadata_format: Literal["jsonl", "csv"] = "jsonl",
) -> None