twiddlepy

twiddlepy is a Python library designed for end-to-end extract, transform and load pipline (or ETL for short). Using a mapper file, and optional functions your data can be transformed into a better suited format.

Features

Extract, Transform and Load pipelines
Multiple datasource options for extracting data
Multiple repository options for loading data
Support for mapping input data

Installation

Twiddlepy is available on the PyPi repository

pip install twiddlepy

Or if you want to install directly from the repository: python setup.py install, or drop the twiddlepy directory anywhere on your PYTHONPATH.

Connectors

There are a number of data repository connectors available with Twiddlepy. Currently implemented connectors include:

Data Source (Input)

File Based
- CSV
- Excel Document
- Support for custom file loading (e.g. HTML)
Database
- MySQL
- MSSQL
- Oracle
- SQLite
MongoDB

Repository (Output)

File Based
- CSV
Apache Solr

Usage

Create a runnable python file with the following code:

from twiddlepy.config import config
from twiddlepy.driver import TwiddleDriver

driver = TwiddleDriver(config)
driver.process_data()

Example Project Structure

.
|-- mapper
|   |-- mapper.csv
|-- local_functions.py
|-- run.py (File that runs Twiddle)
|-- twiddle.cfg

User Configuration

Importing config from twiddle.config will import the default configuration items for each of the processes, and will also look for a user defined configuration file on the path where the application is being run from.

All of the configuration items, including all of the default options can be found here

Mapper File

A mapper file defined by the user is used to defined the input data that will be extracted from the data repository. The mapper file is a CSV in which there are multiple columns that can be filled in to specify the data mappings. Thw following columns must be defined in the mapper:

Column Name	Description	Options
dataset	The dataset twiddlepy will use mappings for	Any name (string)
source_field_name	A name of a source field	Any name (string)
source_field_type	The data type of the source field	One of: "str", "int", "float", "double", "timestamp"
allow_missing	Allow the column to be missing in the dataset	One of: "y", "n" (Yes or No)
min	Data Validation: minimum allowed value	Any numeric value
max	Data Validation: maximum allowed value	Any numeric value
allowed_values	Data Validation: list of allowed values	Any array of values
unit	The unit the column is represented by	Any name (string) e.g. kg
repository	The repository name the column belongs to	Any name (string)
repository_field_name	The name the column will be renamed to for data loading	Any name (string)
repository_field_type	The data type that will be applied to the column when loading	One of: "string", "integer", "float", "double", "date"
ignore	Mark column to be ignore by mapping process (for historic datasets)	One of: "y", "n" (Yes or No)

Contribute

As a company, we welcome any input to fix/improve the project. Whilst we don't have a style guide currently, this is something we will be working on in the future to improve the project further. We're very interested to hear what you think about Twiddlepy, and any improvements you would like to see so please raise any issues in the tracker!

Contact

Got a problem/query and want to discuss it with us personally? Contact us at info@mediaintegration.co.uk. We also have a website with more information about the company here

twiddlepy
Release 0.1.3

Release 0.1.3

0.1.3

0.1.2

0.1.1

0.0.1

Documentation

twiddlepy

Features

Installation

Connectors

Data Source (Input)

Repository (Output)

Usage

Example Project Structure

User Configuration

Mapper File

Contribute

Contact

Stats

Releases

Contributors

twiddlepy Release 0.1.3

Release 0.1.3 Toggle Dropdown 0.1.3 0.1.2 0.1.1 0.0.1

Documentation

twiddlepy

Features

Installation

Connectors

Data Source (Input)

Repository (Output)

Usage

Example Project Structure

User Configuration

Mapper File

Contribute

Contact

Stats

Releases

Contributors

twiddlepy
Release 0.1.3

Release 0.1.3

0.1.3

0.1.2

0.1.1

0.0.1