pandas-multiprocessing

Pandas Multiprocessing Support


License
BSD-3-Clause
Install
pip install pandas-multiprocessing==0.2.1

Documentation

pandas_multiprocessing

Python package that supports multiprocessing apply command on pandas dataframe. Very helpful for large massive dataframes.

Installation:

To install from Pypi:

 pip install pandas_multiprocessing

To install from source:

 git clone https://github.com/amro-pydev/pandas_multiprocessing.git
 cd pandas_multiprocessing
 python setup.py install

Multiprocessing groupby/apply:

Original syntax for apply or groupby/apply:

data_frame.groupby(column_list).apply(apply_func, *args, **kwargs)

You could multiprocess this apply command by using our package:

import pandas_multiprocessing as pdmp
pdmp.mp_groupby(data_frame, column_list, apply_func, *args, **kwargs)

The arguments to mp_groupby() are the same as in the Pandas groupby/apply except for the additional mp_arg argument, which contains multiprocessing information such as the number of CPUs to use and load balancing information.