oarphpy

A collection of Python utils with an emphasis on Data Science


Keywords
pyspark, python, spark
License
Apache-2.0
Install
pip install oarphpy==0.1.1

Documentation

                                          _________________________
                                         < OarphPy!! Oarph! Oarph! >
                                         <   OarphKit for Python!! >
                                          -------------------------
                                                        \
                                                         \
                         ____                __   ___       -~~~~-
                        / __ \___ ________  / /  / _ \__ __|O __ O|     
                       / /_/ / _ `/ __/ _ \/ _ \/ ___/ // /|_\__/_|__-  
                       \____/\_,_/_/ / .__/_//_/_/   \_,---(__/\__)---  
                                 .--/_/             /___/ /  ~--~  \    
                            ,__;`  o __`'.          _,..-/  | \/ |  \   
                            '  `'---'  `'.'.      .'.'` |   | /\ |   |
                                          .'-...-`.'  _/ /\__    __/\ \_
                                            -...-`  ~~~~~    ~~~~    ~~~~~

License Build Status PyPI version

OarphPy is a collection of Python utilities for Data Science with PySpark and Tensorflow. Related (but orthogonal) to OarphKit.

Quickstart

Install from PyPI: pip install oarphpy. We test OarphPy in a variet of environments (see below), so it should play well with your Jupyter/Colab notebook or project environment. To include all extras, use pip install oarphpy[all].

Or use the dockerized environment hosted on DockerHub:

  $ ./oarphcli --shell
  -- or --
  $ docker run -it --net=host oarphpy/full bash

See also API documentation.

Demos

Dockerized Development Environments

OarphPy is built and tested in a variety of environments to ensure the library works with and without optional dependencies. These environments are shared on DockerHub and defined in the docker subdirectory of this repo:

  • oarphpy/full -- Includes Tensorflow, Jupyter, a binary install of Spark, and other tools like Bokeh. Use this environment for adhoc data science or as a starter for other projects.

  • oarphpy/base-py2 -- Tests oarphpy in a vanilla Python 2.7 environment to ensure clean interop with other projects.

  • oarphpy/base-py3 -- Tests oarphpy in a vanilla Python 3 environment to ensure clean interop with other projects.

  • oarphpy/spark -- Tests oarphpy with a vanilla install of PySpark to ensure basic compatibility.

  • oarphpy/tensorflow -- Tests oarphpy with Tensorflow 1.x to ensure basic compatibility (e.g. of oarphpy.util.tfutil).

Development

See ./oarphcli --help for the development and release workflow.