`pipesnake`

a pandas sklearn-inspired pipeline data processor

pipesnake is a data processing pipeline able to handle Pandas Dataframes. In many cases Dataframes are used to clean-up data, pre-processing it and to perform feature engineering, pipesnake tries to simplify these steps, creating complex pipelines.

documentation; examples;

Why?

Two easy reasons:

in many cases Pandas DataFrame is super easy to build feature extractor or data preocessors
in many cases it is useful to have a pipeline that can process both x and y at the same time

How can you use `pipesnake` ?

Install

The easy way:

pip install --upgrade https://github.com/pierluigi-failla/pipesnake/tarball/master

to get the latest version available on GitHub, or:

pip install pipesnake

to install the latest stable version on PyPi.

Coding

You can build your own pipelines combining SeriesPipe and ParallelPipe, both of them can handle list of Transformer.

An inherited Transformer object is a class which implements the abstract base.Transformer methods:

from pipesnake.base import Transformer

class MyTransformer(Transformer):
    def __init__(self, name=None, <your params>):
        Transformer.__init__(self, name=name, ...)
        
    def fit_x(self, x):
        <your implementation>

    def fit_y(self, y):
        <your implementation>

    def transform_x(self, x):
        <your implementation>

    def transform_y(self, y):
        <your implementation>

    def inverse_transform_x(self, x):
        <your implementation>

    def inverse_transform_y(self, y):
        <your implementation>

You can find some Transformers already implemented in pipesnake.transformers.

Once you have all the needed Transformers you can create pipelines for feature engineering or data processing using SeriesPipe or ParallelPipe:

from pipesnake.pipe import ParallelPipe
from pipesnake.pipe import SeriesPipe

pipe = SeriesPipe(transformers=[
    ParallelPipe(transformers=[
        MyTransformer1(<params>),
        MyTransformer2(<params>),
    ]),
    MyTransformer3(<params>),
])

More info in the documentation and in the examples.

Batteries included

pipesnake comes with several transformers included:

Module	Name	Short Description
`pipenskae.transformers.combiner`	`Combiner`	Apply user function to a column or a set of columns
`pipenskae.transformers.combiner`	`Roller`	Apply the provided function rolling within a given window
`pipenskae.transformers.converter`	`Category2Number`	Convert categorical to number
`pipenskae.transformers.deeplearning`	`LSTMPacker`	Pack rows in order to be used as input for LSTM networks
`pipenskae.transformers.dropper`	`DropDuplicates`	Drop duplicated rows and/or cols
`pipenskae.transformers.dropper`	`DropNanCols`	Drop cols with nans
`pipenskae.transformers.dropper`	`DropNanRows`	Drop rows with nans
`pipenskae.transformers.financial`	`ToReturn`	Convert columns to `financial return`: r_t = (x_t - x_{t-1}) / x_{t-1}
`pipenskae.transformers.imputer`	`ReplaceImputer`	Impute NaNs replacing them
`pipenskae.transformers.imputer`	`KnnImputer`	Impute NaNs using K-nearest neighbors
`pipenskae.transformers.misc`	`ToNumpy`	Convert `x` and `y` to a particular numpy type
`pipenskae.transformers.misc`	`ColumnRenamer`	Rename `x` and `y` columns
`pipenskae.transformers.misc`	`Copycat`	Copy the datasets forward
`pipenskae.transformers.scaler`	`MinMaxScaler`	Min max scaler
`pipenskae.transformers.scaler`	`StdScaler`	Standard deviation scaler
`pipenskae.transformers.scaler`	`MadScaler`	Median absolute deviation scaler
`pipenskae.transformers.scaler`	`UnitLenghtScaler`	Scale the feature vector to have norm 1.0
`pipenskae.transformers.selector`	`ColumnSelector`	Select a given list of column names to keep
`pipenskae.transformers.stats`	`ToSymbolProbability`	Convert values in columns to their probabilities

How can you contribute to `pipesnake` ?

First of all grab a copy of the repository:

git clone https://github.com/scikit-learn/scikit-learn.git

you can run tests just running run_tests.py.

There is a bunch of things you can contribute as far as pipesnake is at its early stages:

improvements: make the library bugfixed, faster, parallel, nicer, cleaner...;
documentation: this library uses Sphinx to generate documentation, so feel free to enrich it;
samples: create examples about using the library;
transformers: develop new-general-purpose transformers to share with the community;
tests: code better tests to extend the coverage and reduce code regression;

or whatever you may thing is relevant to make pipesnake better.

pipesnake
Release 0.1

Release 0.1

0.1

Documentation

`pipesnake`

Why?

How can you use `pipesnake` ?

Install

Coding

Batteries included

How can you contribute to `pipesnake` ?

Stats

Development practices

Releases

Contributors

pipesnake Release 0.1

Release 0.1 Toggle Dropdown 0.1

Documentation

pipesnake

Why?

How can you use pipesnake ?

Install

Coding

Batteries included

How can you contribute to pipesnake ?

Stats

Development practices

Releases

Contributors

pipesnake
Release 0.1

Release 0.1

0.1

`pipesnake`

How can you use `pipesnake` ?

How can you contribute to `pipesnake` ?