diffit

Diff-it: Spark Dataframe Differ


Keywords
python3, spark
License
MIT
Install
pip install diffit==0.1.5

Documentation

Diff-it: Data Differ

Overview

diffit will report differences between two data sets with similar schema.

Refer to Diffit's documentation for detailed instructions.

Prerequisites

Getting Started

Makester is used as the Integrated Developer Platform.

(macOS Users only) Upgrading GNU Make

Follow these notes to get GNU make.

Creating the Local Environment

Get the code and change into the top level git project directory:

git clone git@github.com:loum/diffit.git && cd diffit

NOTE: Run all commands from the top-level directory of the git repository.

For first-time setup, get the Makester project:

git submodule update --init

Initialise the environment:

make init-dev

Local Environment Maintenance

Keep Makester project up-to-date with:

git submodule update --remote --merge

Help

There should be a make target to get most things done. Check the help for more information:

make help

Running the Test Harness

We use pytest. To run the tests:

make tests

FAQs

Q. Why do I get WARNING: An illegal reflective access operation has occurred? Seems to be related to the JVM version being used. Java 8 will suppress the warning. To check available Java versions on your Mac try /usr/libexec/java_home -V. Then:

export JAVA_HOME=$(/usr/libexec/java_home -v <java_version>)

top