hugit

Hugit


Keywords
cli, datasets, huggingface-datasets
License
MIT
Install
pip install hugit==0.1.1

Documentation

Hugit

PyPI Status Python Version License

Read the documentation at https://hugit.readthedocs.io/ Tests Codecov

pre-commit Black

Warning: this code is very much a work in progress and is primarily being intended for a particular workflow. It may not work well (or at all) for your workflow.

hugit is a command line tool for loading ImageFolder style datasets into a 🤗 datasets Dataset and pushing to the 🤗 hub.

The primary goal of hugit is to help quickly get a local dataset into a format that can be used for training computer vision models. hugit was developed to support the workflow for flyswot where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside flyswot.

hugit workflow diagram

Supported formats

At the moment hugit supports ImageFolder style datasets i.e:

data/
    dog/
        dog1.jpg
    cat/
        cat.1.jpg

Features

  • A command line interface for quickly loading a dataset stored on disk into a 🤗 datasets.Dataset
  • Push your local dataset to the 🤗 hub
  • Get statistics about your dataset. These statistics focus on 'high level' statistic that would be useful to include in Datasheets and Model Cards. Currently these statistics include:
    • label frequencies, organised by split
    • train, test, valid split sizes

Installation

You can install Hugit via pip from PyPI, inside a virtual environment install hugit using

$ pip install hugit

Alternatively, you can use pipx to install hugit

$ pipx install hugit

Usage

You can see help for hugit using hugit --help

Usage: hugit [OPTIONS] COMMAND [ARGS]...

  Hugit Command Line

Options:
  --help  Show this message and exit.

Commands:
  convert_images      Convert images in directory to `save_format`
  push_image_dataset  Load an ImageFolder style dataset.

To load an ImageFolder style dataset onto the 🤗 Hub you can use the push_image_dataset command.

Usage: hugit push_image_dataset [OPTIONS] DIRECTORY

  Load an ImageFolder style dataset.

Options:
  --repo-id TEXT                  Repo id for the Hugging Face Hub  [required]
  --private / --no-private        Whether to keep dataset private on the Hub
                                  [default: private]
  --do-resize / --no-do-resize    Whether to resize images before upload
                                  [default: do-resize]
  --size INTEGER                  Size to resize image. This will be used on the
                                  shortest side of the image i.e. the aspect
                                  rato will be maintained  [default: 224]
  --preserve-file-path / --no-preserve-file-path
                                  preserve_orginal_file_path  [default:
                                  preserve-file-path]
  --help                          Show this message and exit.

Under the hood hugit uses typed-settings, which means that configuration can either be done through the command line or through a TOML file. See usage for more detailed discussion of how to use hugit.

Contributing

It is likely that Hugit may only work for our particular workflow. With that said if you have suggestions please open an issue.

License

Distributed under the terms of the MIT license, Hugit is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.