luigi-daisy

Utility wrapper of luigi


License
BSD-3-Clause
Install
pip install luigi-daisy==0.0.3

Documentation

daisy

Thin wrapper of luigi for utility.

Features

Formatted local targets

Classes inheriting daisy.FormattedLocalTargetBase is provided for dumping and loading objects by one-liner.

import daisy
import pandas as pd

df = pd.DataFrame({"a": [1,2,3], "b": [4,5,6]})
df
# =>
#    a  b
# 0  1  4
# 1  2  5
# 2  3  6

targ = daisy.CsvTarget("./output.csv")

# dumping
targ.dump(df)

# loading
df2 = targ.load()

df2
# =>
#    a  b
# 0  1  4
# 1  2  5
# 2  3  6

# `daisy.FormattedLocalTargetBase` also inherits `luigi.LocalTaget`
# so that original api is also enabled.
with targ.open("r") as fd:
    s = fd.read()

Default output for task

daisy.Task inherits luigi.Task and provides default output features.

By setting file extension via ext attribute, daisy automatically configure corresponding FormattedLocalTarget with default file name.

Single output

class TaskA(daisy.Task):
    param1 = daisy.Parameter()
    ext = "csv"

is equivalent to:

class TaskA(luigi.Task):
    param1 = luigi.Parameter()

    def output(self):
        return daisy.CsvTarget("./data/TaskA/TaskA(param1={}).csv".format(self.param1))

Multiple outputs

class TaskA(daisy.Task):
    param1 = daisy.Parameter()

    ext = {
        "vectors": "npy",
        "metadata": "json"
    }

is equivalent to:

class TaskA(luigi.Task):
    param1 = luigi.Parameter()

    def output(self):
        return {
            "vectors": daisy.NpyTarget("./data/TaskA/TaskA(param1={}).npy".format(self.param1)),
            "metadata": daisy.JsonTarget("./data/TaskA/TaskA(param1={}).json".format(self.param1))
            }

Available extension and file types are as follows:

Target class Object type extension
CsvTarget pandas.DataFrame csv
NpyTarget numpy.ndarray npy
JsonTarget dict json
PickleTarget object pkl pickle
FeatherTarget pandas.DataFrame feather

view source code for detail.

Configuration

For configuration, edit [daisy] section of luigi.cfg.

[daisy]

# default output directory
data_dir = luigi.Parameter("./data")