gcp-io

A package to return files stored in Google drive and Google Cloud Storage as a torch.utils.Dataset that can be used with the torch dataloaders.


Install
pip install gcp-io==0.0.0.2

Documentation

GCP-IO

Generate PyTorch datasets from files stored in Google cloud storage and Google drive.

Installation

Run the following to install this package:

pip install gcpio

Environment Variables Required

LOG_FILE_PATH: Path to store the logs in a log file
LOG_LEVEL: LOG LEVEL(Example: INFO, DEBUG, ERROR etc)
TOKEN_PATH_GDRIVE: path to the generated token.json file

Example usage:

Google Drive:

from gcpio.gdrive import Gdrive

# Get meta data of files present in a Google Drive folder
GDRIVE = Gdrive()
files=GDRIVE.get_files_metadata(folder_id="XXXXX",page_size=500,file_type="image/png",replace_query=None) # returns a dict with keys['files','len'] where files is a list of objects from the Drive folder

# Create the torch dataset from files
GDRIVE = Gdrive()
dataset = g.create_dataset(
    data_folder_id=folder_id,
    labels_folder_id=folder_id,
    page_size=1000,
    data_file_type="image/png",
    labels_file_type="text/csv",
    skip_labels=skip_labels,
)

# load this dataset into torch dataloader.
dataloader = DataLoader(
    dataset,
    batch_size,
    sampler=BatchSampler(
        SequentialSampler(dataset), batch_size=batch_size, drop_last=True
    ),
)
#get a sample batch
samples = next(iter(dataloader))
images = []
    for i in range(len(samples)):
        # ignoring labels and collecting images only for verification
        img, _ = samples[i]
        images.append(img)
    images = torch.cat(images)

grid = make_grid(images, nrow=20)
save_image(grid, f"./examples/generated_images.png")

outputs

output