prep-buddy 0.5.1 on PyPI

Prep Buddy

Data Preparation Library for Spark

A Scala / Java / Python library for cleaning, transforming and executing other preparation tasks for large datasets on Apache Spark.

It is currently maintained by a team of developers from ThoughtWorks.

Post questions and comments to the Google group, or email them directly to data-commons-toolchain@googlegroups.com

Docs are available at http://data-commons.github.io/prep-buddy Or check out the Scaladocs.

Our aim is to provide a set of algorithms for cleaning and transforming very large data sets,
inspired by predecessors such as Open Refine, Pandas and Scikit-learn packages.

Important links

Official source code repo: https://github.com/data-commons/prep-buddy
Scala docs (development version): http://data-commons.github.io/prep-buddy/scaladocs
Download releases: Latest Release
Issue tracker: Github
Mailing list: data-commons-toolchain@googlegroups.com

Usage!

To use this library, add a maven dependency to datacommons in your project:

<dependency>
    <groupId>com.thoughtworks.datacommons</groupId>
    <artifactId>prep-buddy</artifactId>
    <version>0.5.1</version>
</dependency>

For other build tools check on Maven Repositry

##Python

If you don't have pip. Intsall pip.

pip install prep-buddy

For using pyspark on command-line Download the Jar.

pyspark --jars [PATH-TO-JAR]

spark-submit --driver-class-path [PATH-TO-JAR] [Your python file.]

This library is currently built for Spark 1.6.x, but is also compatible with 1.4.x.

Dependencies

The library depends on a few other libraries.

Apache Commons Math for general math and statistics functionality.
Apache Spark for all the distributed computation capabilities.
Open CSV for parsing the files.

Download

Stable 0.5.1(Beta).

Documentation Wiki

http://data-commons.github.io/prep-buddy

Contributing

Create a pull request.

prep-buddy
Release 0.5.1

Release 0.5.1

0.5.11

0.5.1

0.5.0.0

Documentation

Prep Buddy

Data Preparation Library for Spark

Important links

Usage!

Dependencies

Download

Documentation Wiki

Contributing

Stats

Development practices

Releases

Contributors

prep-buddy Release 0.5.1

Release 0.5.1 Toggle Dropdown 0.5.11 0.5.1 0.5.0.0

Documentation

Prep Buddy

Data Preparation Library for Spark

Important links

Usage!

Dependencies

Download

Documentation Wiki

Contributing

Stats

Development practices

Releases

Contributors

prep-buddy
Release 0.5.1

Release 0.5.1

0.5.11

0.5.1

0.5.0.0