jennytest

Data quality and profiling tool powered by Apache Spark.


Keywords
apache-spark, pyspark, data, quality, profiling
License
MIT
Install
pip install jennytest==0.2.9

Documentation

Jenny

Welcome to Jenny Project! Data quality checks and data profiling made easy.

Python Example:

from jenny import my_lib

def foo(arg):
    return my_lib.awesome_function(arg)  # change me

This library supports Python version 3.7+

Installing

pip install jenny

Or after listing jenny in your requirements.txt file:

pip install -r requirements.txt

This will expose my_lib under jenny module:

from jenny import my_lib

def foo():
    bar = my_lib.cool_method()

Development Environment

At the bare minimum you'll need the following for your development environment:

  1. Python 3.7.2

It is strongly recommended to also install and use pyenv or virtualenv to use the project locally.

Getting started

1. Clone the project:

    git clone git@github.com:rafaelleinio/jenny.git
    cd jenny

2. Setup the python environment for the project:

For example using virtualenv, in the root of the repository run the following:

python3.7 -m virtualenv venv
source venv/bin/activate

If you need to configure your development environment in your IDE, notice that virtualenv Python will be under: /path/to/jenny/venv/bin/python

Errors

If you receive one error of missing OpenSSL to run the pyenv install, you can try to fix running:

sudo apt install -y libssl1.0-dev

3. Install dependencies

make requirements
Errors

If you receive one error like this one:

 "import setuptools, tokenize;__file__='/tmp/pip-build-98gth33d/googleapis-common-protos/setup.py';
 .... 
 failed with error code 1 in /tmp/pip-build-98gth33d/googleapis-common-protos/

You can try to fix running:

python -m pip install --upgrade pip setuptools wheel

Development

Tests

Just run make tests to check if your code is fine.

Unit tests rely under the test module and integration tests, under the integration_test module.

Run only unit tests: make unit-tests

Run only integration tests: integration-tests

pytest is used to write all of this project's tests.

Code Style, PEP8 & Formatting

Just run make black before you commit to format all code.

Check if everything is fine with make flake8.

This project follows the Black Code Style which follows PEP8 and unifies style across the project's codebase.

Additionally Flake 8 is used to check for other things such as unnecessary imports and code-complexity.

You can check Flake 8 and Black by running the following within the project root:

make checks

Release

TBD

Contributing

Any contributions are welcome! Feel free to open Pull Requests.