bigninja

pyspark helpers


License
Apache-2.0
Install
pip install bigninja==0.0.3

Documentation

bigninja

PyPI PyPI - License

PySpark helpers to maximise data engineer productivity. Follow pain-driven development technique.

Setup

After pip install bigninja start using it by

from bigninja import *

BigNinja works by adding extension methods to Spark's DataFrame class. All the methods start with bn_ prefix to avoid conflicts with built-in methods.

DataFrame

.bn_select(*pattern: str), .bn_drop(*pattern: str)

Select/drop columns using a wildcard pattern i.e. df.wc_select("co*") returns columns starting with co. For instance:

  • bn_select("ci*") will select columns starting with city.
  • bn_select("id*", "ci*") with select both columns starting with id and ci and so on.

.bn_display()

Is like .show() but truncate is set to False and arrays and structs are transformed to JSON so that you can read it.

.bn_union(df: DataFrame)

Unions DataFrames, even if number of columns, their names and types don't match, by creating an overlap of columns from both datasets and filling missing values with null.

Etc