pyspark-util

PySpark utility functions


Install
pip install pyspark-util==0.1.2

Documentation

pyspark-util

A set of pyspark utility functions.

import pyspark_util as psu

data = [(1, 2, 3)]
columns = ['a', 'b', 'c']
df = spark.createDataFrame(data, columns)
prefixed = psu.prefix_columns(df, 'x')
prefixed.show()

# output:
+---+---+---+
|x_a|x_b|x_c|
+---+---+---+
|  1|  2|  3|
+---+---+---+

Development

Setup

docker-compose build
docker-compose up -d

Lint

docker exec psu-cnt ./tools/lint.sh

Test

docker exec psu-cnt ./tools/test.sh