Pyplatform is a data analytics platform built around Google BigQuery. This package provides wrapper functions for interacting with cloud services and creating data pipelines using Google Cloud, Microsoft Azure, O365, and Tableau Server as source and destination.


Keywords
google, bigquery, cloud, functions, storage, python, SQL
License
BSD-3-Clause
Install
pip install pyplatform==2020.12.1

Documentation

Pyplatform is a data analytics platform architeture built around Google BigQuery in a hybrid cloud environment.

the platorm:

  • provides fast, scalable and reliable SQL database solution
  • abstracts away the infrastuture by builiding data pipelines with serverless compute solutions in python runtime environments
  • simplifies development environment by using jupyter lab as the main tool

Installation

pip install pyplatform

Setting up development environment

git clone https://github.com/mhadi813/pyplatform
cd pyplatform
conda env create -f pyplatform_dev.yml

Environment variables

import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/default_service_account.json'
os.environ['DATASET'] = 'default_bigquery_dataset_name'
os.environ['STORAGE_BUCKET'] = 'default_storage_bucket_id'

Usage

common data pipeline architectures:

- Http sources

- On-prem servers

- Bigquery integration with Azure Logic Apps

- Event driven ETL process

- Streaming pipelines

Exploring modules

import pyplatform as pyp
pyp.show_me()