tartare

Data filters


Keywords
dataset-creation, dataset-generation, keras
License
Apache-2.0
Install
pip install tartare==0.0.1

Documentation

Tartare: Make homebrew image dataset for machine learning.

You can create your own dataset for Keras and other.

It is a library to easily create a data set for Keras based on the JPEG images you collected. In addition to creating the data set, there is also a data augmentation function.

This version only supports JPEG format.

Compatible with: Python 3.5, 3.6

Operation confirmed:

  • macOS 10.13 High Sierra
  • Ubuntu 17.10 Artful Aardvark & Ubuntu 18.04 Bionic Beaver

Requirement

  • NumPy 1.14.5
  • SciPy >= 1.1.0
  • Pillow 5.2.0
  • Scikit-Learn 0.19.2

Usage

Precondition

  • Directory Strucure
.
├── tutorial.py
└── apple/
│   ├── 0.jpg
│   ├── 1.jpg
│   ├── 2.jpg
│   ├── 3.jpg
│   └── 4.jpg
│  
└── melon/
   ├── 0.jpg
   ├── 1.jpg
   ├── 2.jpg
   ├── 3.jpg
   └── 4.jpg

1, Data Augmentation

You can augment your images.

augment function

  • mirror=True : Flips the specified image horizontally.
  • flip=True : Flip the specified image vertically.
  • brightness=True : Decrease the brightness of the specified image randomly.
  • contrast=True : Raise the contrast of the specified image randomly.
  • mask=True : Based on the vertical or horizontal length of the input image, make a mask 0.3 - 0.5 times the length of the shorter side, and mask the random position of the image. Create 5 pictures.

scaling function

  • scale : Scaling rate (Float) Make new Directory and save scaled images in those directory.

tutorial.py

from tartare.Vision import DataAugmentation

DataAugmentation("apple").init().augment(mirror=True,
                                         flip=True,
                                         brightness=True,
                                         contrast=True,
                                         mask=True)

DataAugmentation("melon").init().augment(mirror=True,
                                         flip=True,
                                         brightness=True,
                                         contrast=True,
                                         mask=True)
                                         
DataAugmentation("apple").init().scaling(scale=1.5)
DataAugmentation("melon").init().scaling(scale=1.5)

2, MakeCategory

You can receive .npz file with contained image data & label

Caution: When creating an .npz file with MakeCategory (), make sure that the number of original image files is the same for all categories.

If you are Mac user, try to remove .DS_Store file.

  • target_dir : Directory name where images of categories are saved.
  • label : Correct answer label for supervised learning.
  • size : Image size when saving. Tuple. (width, height).
  • mode : Choose a "RGB" or a "gray".
  • filename: File name when output.
  • verbose : When True is selected, when the file output succeeds, the result is output.

tutorial.py

from tartare.Vision import MakeCategory

MakeCategory(target_dir="apple").init(label=0, size=(64, 64), mode="RGB").export_category(filename="apple.npz", verbose=True)
MakeCategory(target_dir="melon").init(label=1, size=(64, 64), mode="RGB").export_category(filename="melon.npz", verbose=True)

3, BuildDataset

Create a data set based on the .npz file created with MakeCategory ().

Caution: When creating an .npz file with MakeCategory (), make sure that the number of original image files is the same for all categories.

  • BuildDataset(filename1, filename2, ...) : Specify .npz for each category.
  • filename: File name when output.
  • verbose : When True is selected, when the file output succeeds, the result is output.

tutorial.py

from tartare.Vision import BuildDataset

BuildDataset("apple.npz", "melon.npz").export_dataset(filename="apple_melon.npz",verbose=True)

4, ExpandImgData

The image (tensor) and label (two dimensions) are given as return values.

  • filename= Name of Dataset.
  • test_size: the proportion of the dataset to include in the test split. Default: 0.3
  • division: Whether or not to divide the data set for training and testing. (Default=True)
  • shuffle: Whether to shuffle the data set. (Default=True)

tutorial.py

from tartare.Vision import ExpandImgData

(x_train, y_train), (x_test, y_test) = ExpandImgData(filename="apple_melon.npz").load_data(test_size=0.3,
                                                                                      division=True,
                                                                                      shuffle=True)

5, Sample

Sample_NN.py

from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from tartare.Vision import ExpandImgData

def main():
    (x_train, y_train), (x_test, y_test) = ExpandImgData("apple_melon.npz").load_data(test_size=0.3,
                                                                                      division=True,
                                                                                      shuffle=True)

    print(x_train.shape, y_train.shape)
    print(x_test.shape, y_test.shape)
    input_shape = x_train.shape[1] * x_train.shape[2] * x_train.shape[3]

    x_train = x_train.reshape(x_train.shape[0], input_shape)  # 2次元配列を1次元に変換
    x_test = x_test.reshape(x_test.shape[0], input_shape)

    train_image = x_train.astype("float32")
    test_image = x_test.astype("float32")

    train_image /= 255.0
    test_image /= 255.0

    train_label = np_utils.to_categorical(y=y_train, num_classes=2)
    test_label = np_utils.to_categorical(y=y_test, num_classes=2)


    model = Sequential()
    model.add(Dense(512, activation="relu", input_shape=(input_shape,)))
    model.add(Dense(512, activation="relu"))
    model.add(Dense(512, activation="relu"))
    model.add(Dense(2, activation='softmax'))
    model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

    epoch_num = 20
    history = model.fit(x=train_image, y=train_label, batch_size=4, epochs=epoch_num, validation_split=0.1, verbose=1)

    model.summary()

    score = model.evaluate(x=test_image, y=test_label, verbose=0)
    print("Test Loss: {}".format(score[0]))
    print("Test Accuracy: {}".format(score[1]))

if __name__ == "__main__":
    main()

Install

  • Install Tartare from PyPI:

    pip3 install tartare

License

MIT

Author

Hirotaka Kawashima