Sparta

Library to help ETL using Pyspark.

Sparta is a simple library to help you work on ETL builds using PySpark.

Important Sources

Installation

Install the latest version with pip install pysparta

Documentation

Sparta

Modules

Extract

This is a module with functions for extracting and reading data.

Example

from sparta.extract import read_with_schema

schema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'
path = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'
df = read_with_schema(path, schema, {'header': 'true'}, 'csv')

Transformation

This is a module with data transformation functions

Example

from sparta.transformation import drop_duplicates

cols = ['longitude','latitude']
df = drop_duplicates(df, 'population', cols)

Load

This is a module with load and write functions.

Example

from sparta.load import create_hive_table

create_hive_table(df, "table_name", 5, "col1", "col2", "col3")

Others

This is a module with several functions that can help in ETL work.

Example

from sparta.secret import get_secret_aws

get_secret_aws('Nome_Secret', 'sa-east-1')

Supported PySpark / Python versions

Sparta currently supports PySpark 3.0+ and Python 3.7+.

pysparta
Release 0.1.0

Release 0.1.0

0.3.0

0.2.0

0.2.1

0.4.2

0.4.0

0.4.1

0.1.0

Documentation

Sparta

Important Sources

Installation

Documentation

Modules

Extract

Transformation

Load

Others

Supported PySpark / Python versions

Stats

Development practices

Releases

Contributors

pysparta Release 0.1.0

Release 0.1.0 Toggle Dropdown 0.3.0 0.2.0 0.2.1 0.4.2 0.4.0 0.4.1 0.1.0

Documentation

Sparta

Important Sources

Installation

Documentation

Modules

Extract

Transformation

Load

Others

Supported PySpark / Python versions

Stats

Development practices

Releases

Contributors

pysparta
Release 0.1.0

Release 0.1.0

0.3.0

0.2.0

0.2.1

0.4.2

0.4.0

0.4.1

0.1.0