datamagus

Data analysis


License
Apache-2.0
Install
pip install datamagus==0.0.4

Documentation

datamagus

PyPI
Packages and API interface for basic data analysis processing, graphing and modeling, and practical use.

Installation

pip install datamagus

Model

The datamagus model module contains several analysis models, including RFM, Pareto, etc.

RFM Example

Example 1: In most cases the data we use to study customer value is based on order sales, consisting of user id,transaction time and transaction amount.

# https://www.kaggle.com/datasets/regivm/retailtransactiondata?select=Retail_Data_Transactions.csv
>>> from datamagus.model import RFMModel
>>> rfm=RFMModel()
>>> df=pd.read_csv('https://raw.githubusercontent.com/Alazia/datamagus/main/src/test/Retail_Data_Transactions.csv') 
    customer_id trans_date  tran_amount
0           CS5295  11-Feb-13           35
1           CS4768  15-Mar-15           39
>>> rfm.load_data(df,its=['customer_id','trans_date','tran_amount'],t="2022-06-27")
>>> rfm.rfm
    id     R   F       M
0     CS1112  2721  15  1012.0
1     CS1113  2695  20  1490.0
2     CS1114  2692  19  1432.0
>>> rfm.unleash()
>>> rfm.rfm_score
    R   F       M  R_score  F_score  M_score     RFM
id                                                         
CS1112  2721  15  1012.0        2        1        1  一般发展客户
CS1113  2695  20  1490.0        2        2        2  重要价值客户
CS1114  2692  19  1432.0        2        2        2  重要价值客户

Example 2: When we have data in RFM format.

>>> df = pd.DataFrame({
'id': np.arange(1, 10001),
'R': np.random.randint(1, 10, 10000),
'F': np.random.randint(1, 100, 10000),
'M': np.random.randint(1000, 10000, 10000),
})
>>> rfm=RFMModel()
>>> rfm.load_data(df)
>>> rfm.unleash()
>>> rfm.rfm_score
    R   F     M  R_score  F_score  M_score     RFM
id                                                   
1      1  93  3841        2        2        1  一般价值客户
2      6  86  1833        1        2        1  一般保持客户
3      7  31  4474        1        1        1  一般挽留客户
>>> rfm.visualize()

RFM Visualization

Pareto

>>> np.random.seed(1)
>>> a = np.random.randint(110000,120000,2)
>>> b = np.random.randint(20000, 80000, 3)
>>> c = np.random.randint(2000, 8000, 5)
>>> df = pd.DataFrame({
'id': np.arange(1, 11),
'M':np.append(np.append(a,b),c)
})
>>> pm=ParetoModel()
>>> pm.load_data(df)
>>> pm.unleash()
>>> pm.res
    id         M  Accumulate  Percent Class
0   2  115192.0    115192.0   26.37%     A
1   1  110235.0    225427.0   51.60%     A
2   4   70057.0    295484.0   67.64%     A
3   5   63723.0    359207.0   82.23%     B
4   3   52511.0    411718.0   94.25%     C
5   7    7056.0    418774.0   95.87%     C
6   9    6225.0    424999.0   97.29%     C
7   6    4895.0    429894.0   98.41%     C
8  10    4797.0    434691.0   99.51%     C
9   8    2144.0    436835.0  100.00%     C
>>> pm.results[1]
                                ID Count_Percent Money_Percent
Class                                                           
A                      '2', '1', '4'        30.00%        67.64%
B                                '5'        10.00%        14.59%
C      '3', '7', '9', '6', '10', '8'        60.00%        17.77%
>>> pm.visualize()

Pareto Visualization