patient-aware-splitter

This package splits a medical imaging dataset into test and train sets in a patient aware and stratified manner.


Keywords
patient, meta-data, covid-19, split-images, classification-dataset, sample-covid
License
MIT
Install
pip install patient-aware-splitter==0.0.1

Documentation

Covid_Patient_Aware_Image_Split

It is important not to split images of the same patient between the test and train sets to avoid overfitting. This repository splits a sample Covid/Normal classification dataset into test and train sets in a patient aware and stratified manner. The meta-data file is used to group the images based on Patient-ID. For example, all the images colored green belong to the same patient and should be either in the test or the train split.

Screenshot

While grouping should be done strictly to ensure there is no splitting images of a patient, stratification can be done approximately i.e. as well as possible. This code also assumes that all images of one patient have the same stratification category (diagnosis), meaning that all the images coming from the same Patient ID are either Covid or NonCovid.

To split images into 4 folders (train/Covid, train/NonCovid, test/Covid, test/NonCovid) inside splitted folder:

split_to_folders.py

To split images into a dictionary:

split_into_dictionary.py

To split images into a torch Dataloader:

split_into_dataloader.py