PtDa
Python package for data analytics.
The package provides:
- WOE calculation
- IV calculation
- Numeric and categorical check
- etc
How to get it?
Binary installers for the latest released version are available at the Python package index.
# with PyPi
pip install ptda
The source code is hosted on Github:
https://github.com/luckyp71/ptda
Dependencies
- Pandas
- Numpy
- Scipy
Example
The following code is the example on how to use ptda. In this example, we use UCI Credit Card dataset.
Load Librares and Data
Check Target Variable Name
Please bear in mind that we need to rename our target variable into target. Luckily in UCI Credit Card dataset we used in this example, the target variable name is already target, hence we don't need make any changes.
Numeric and Categorical Variable Check
This method will return dataframe which contains numeric_var and categorical_var fields. Those fields are used to inform us whether the particular feature/variable is numeric or categorical, 1 for yes and 0 for no.
How does it work?
What if we have categorical feature that has many unique values, let say 15?
Well the cn_df method has one optional argument, i.e. n_bin, so if you have many unique values in your categorical feature/var, you can pass that unique values count as n_bin in the cn_df method (the default of n_bin is 10).
WOE and IV Calculation
woe_iv is a method to calculate WOE and IV as well as generating dataframe which contains those two information.