Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.


Keywords
exporters, export, data
License
BSD-3-Clause
Install
pip install exporters==0.7.0

Documentation

Exporters project documentation

Exporters provide a flexible way to export data from multiple sources to multiple destinations, allowing filtering and transforming the data.

This Github repository is used as a central repository.

Full documentation can be found here http://exporters.readthedocs.io/en/latest/

Getting Started

Install exporters

First of all, we recommend to create a virtualenv:

virtualenv exporters
source exporters/bin/activate

Installing:

pip install exporters

Creating a configuration

Then, we can create our first configuration object and store it in a file called config.json.
This configuration will read from an s3 bucket and store it in our filesystem, exporting only the records which have United States in field country:
{
     "reader": {
         "name": "exporters.readers.s3_reader.S3Reader",
         "options": {
             "bucket": "YOUR_BUCKET",
             "aws_access_key_id": "YOUR_ACCESS_KEY",
             "aws_secret_access_key": "YOUR_SECRET_KEY",
             "prefix": "exporters-tutorial/sample-dataset"
         }
     },
     "filter": {
         "name": "exporters.filters.key_value_regex_filter.KeyValueRegexFilter",
         "options": {
             "keys": [
                 {"name": "country", "value": "United States"}
             ]
         }
     },
     "writer":{
         "name": "exporters.writers.fs_writer.FSWriter",
         "options": {
             "filebase": "/tmp/output/"
         }
     }
}

Export with script

We can use the provided script to run this export:

python bin/export.py --config config.json

Use it as a library

The export can be run using exporters as a library:

from exporters import BasicExporter

exporter = BasicExporter.from_file_configuration('config.json')
exporter.export()

Resuming an export job

Let's suppose we have a pickle file with a previously failed export job. If we want to resume it we must run the export script:

python bin/export.py --resume pickle://pickle-file.pickle