data-lineage

Open Source Data Lineage Tool for Redshift. Snowflake and many other databases


Keywords
data-lineage, postgres, snowflake, redshift, glue, data-governance, jupyter, postgresql, python
License
MIT
Install
pip install data-lineage==0.9.0

Documentation

CircleCI codecov PyPI image image

Data Lineage for Databases and Data Lakes

Data Lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.

Features

  • Generate lineage from SQL query history.
  • Supports ANSI SQL queries
  • Integrate with Jupyter Notebook
  • Visualize data lineage using Plotly.
  • Select source or target table.
  • Pan, Zoom, Select graph

Checkout an example data lineage notebook.

Use Cases

Data Lineage enables the following use cases:

  • Business Rules Verification
  • Change Impact Analysis
  • Data Quality Verification

Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.

Quick Start

# Install packages
pip install data-lineage
pip install jupyter

jupyter notebook

# Checkout example notebook: http://tokern.io/docs/data-lineage/example/ 

Supported Technologies

  • Postgres

Coming Soon

  • MySQL
  • AWS Redshift
  • SparkSQL
  • Presto

Developer Setup

# Install dependencies
pipenv install --dev

# Setup pre-commit and pre-push hooks
pipenv run pre-commit install -t pre-commit
pipenv run pre-commit install -t pre-push