Life as a maintainer after the xz utils backdoor hack 👉 Watch now!

helper-funcs
Release 0.1.35

This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.

Homepage PyPI Python

Keywords: Helper, Functions, Data, Science
License: MIT
Install: pip install helper-funcs==0.1.35

Documentation

Functionality Guide

This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.

Dependencies

numpy 1.17.1
pandas 0.25.1

Installation Guide

pip install helper-funcs

Usage

Import class "HF" from module "helper_funcs":

from helper_funcs import HF

And then call any of the methods described below.

Methods

df_preview(df, n_samples)

Description

Creates a nice summary table of your DataFrame.

Parameters
- df: pandas.DataFrame
  
  The DataFrame you want to create a preview for.
- n_samples: int, optional (default = 2)
  
  Number of unique values from each column to be displayed.
Returns
- pandas.DataFrame containing the summary information about the passed DataFrame.

rename_col(df, old_name, new_name)

Description

Renames the specified column.

Parameters
- df: pandas.DataFrame
  
  The DataFrame you want to create a preview for.
- old_name: str
  
  Name of existing df column to be renamed.
- new_name: str
  
  Name which will replace the old_name column name.
Returns
- pandas.DataFrame with the renamed column.

columns_mismatch(col_1, col_2)

Description

Extracts values that are present in col_1, but not in col_2.

Parameters
- col_1: pandas.Series
  
  The Series you want to subtract values from.
- col_2: pandas.Series
  
  The Series which is subtracted from col_1.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.

Returns
- Set with values which col_1 contains and col_2 does not contain.

df_difference(df_1, df_2)

Description

Extracts rows that are present in df_1, but not in df_2.

Note: df_1 and df_2 can have different column names, but number of columns should match.

Parameters
- df_1: pandas.DataFrame
  
  The DataFrame you want to subtract values from.
- df_2: pandas.DataFrame
  
  The DataFrame which is subtracted from df_1.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.

Returns
- pandas.DataFrame with rows which df_1 contains and df_2 does not contain.

verify_dates_integity(df, date_col)

Description

Checks whether there are any missing dates between earliest and latest dates from df[date_col]

Parameters
- df: pandas.DataFrame
  
  The DataFrame which after selecting values from date_col will be verified for integrity
- date_col: str
  
  Name of df column that will be verified for integrity

duplicate(df, how, n_times)

Description

Extends the specified DataFrame by repeating its rows.

Parameters
- df: pandas.DataFrame
  
  The DataFrame which rows you want to repeat
- how: str
  
  Strategy for repeating. Should be either 'whole' (then [1,2] -> [1,2,1,2]) or 'element_wise' (then [1,2] -> [1,1,2,2])
- n_times: int
  
  Number of repetitions of each row
Returns
- Extended pandas.DataFrame with repeated rows

groupby_to_list(df, by_cols, col_to_list)

Description

Extracts values of col_to_list column that correspond to the same values in by_cols column(s) and put them to list.

Parameters
- df: pandas.DataFrame
  
  The DataFrame which you want to use
- by_cols: list of str
  
  Column names that will be used as keys in df
- col_to_list: str
  
  Column name which values will be put to lists
Returns
- pandas.DataFrame with columns [by_cols, col_to_list] so that all the values in col_to_list column are lists.

chunkenize(data_to_split, num_chunks, df_indices, copy)

Description

Splits the data_to_split into list with num_chunks chunks. Can be helpful when preparing data for parallel processing.

Parameters
- data_to_split: pandas.DataFrame or list
  
  The DataFrame which you want to split in chunks
- num_chunks: int
  
  Number of chunks that your data will be split in
- df_indices: list of str, optional (default = [])
  
  This can be used when data_to_split is pandas.DataFrame. These column will be used as DataFrame index before splitting and will be reset afterwards.
- copy: bool, optional (default = True)
  
  Determines whether you want to perform splitting on a copy of data_to_split.
Returns
- List of num_chunks chunks that have same type as data_to_split.

filter_df(df, col_name, l_bound, r_bound, inclusive)

Description

Filters the df DataFrame col_name column so that it contains only records that corresponds to df[col_name] values in the range between l_bound and r_bound.

Parameters
- df: pandas.DataFrame
  
  The DataFrame which column col_name you want to filter
- col_name: str
  
  Column name from df which values you want to filter df on
- l_bound: same type as values of df[col_name]
  
  Left bound of the filtered values range. Can be omitted if r_bound is specified
- r_bound: same type as values of df[col_name]
  
  Right bound of the filtered values range. Can be omitted if l_bound is specified
- inclusive: bool, optional (default = True)
  
  Determines whether you want range to be inclusive (True) or exclusive (False)
Returns
- Filtered pandas.DataFrame

prepare_str_cols(df, make_uppercase)

Description

Strips leading and trailing spaces in str columns of df and makes those values to either upper-case or lower-case.

Parameters
- df: pandas.DataFrame
  
  The DataFrame you want to prepare str columns for.
- make_uppercase: bool
  
  Determines whether you want str values to be upper-cased or lower-cased.
Returns
- pandas.DataFrame where all strings are either upper-cased or lower-cased with all leading and trailing spaces removed.

Dependencies: 0
Dependent packages: 0
Dependent repositories: 0
Total releases: 9
Latest release: Nov 10, 2019
First release: Oct 19, 2019
Stars: 1
Forks: 0
Watchers: 1
Contributors: 1
Repository size: 12.7 KB
SourceRank: 6

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!
Package manager 2FA enabled: TEXT!

Releases

0.1.35: Nov 10, 2019
0.1.34: Oct 29, 2019
0.1.4: Oct 29, 2019
0.1.33: Oct 19, 2019
0.1.32: Oct 19, 2019
0.1.3: Oct 19, 2019
0.1.2: Oct 19, 2019
0.1.1: Oct 19, 2019
0.1: Oct 19, 2019

Contributors

See all contributors

Something wrong with this page? Make a suggestion

Export .ABOUT file for this package

Last synced: 2021-02-16 07:34:11 UTC

Login to resync this project