spmf-wrapper

Python Wrapper for SPMF


Keywords
SPMF, pattern, mining
License
GPL-3.0
Install
pip install spmf-wrapper==0.5.0

Documentation

SPMF

Python Wrapper for SPMF Java library.

Information

This module contains python wrappers for pattern mining algorithms implemented in SPMF Java library. Each algorithm is implemented as a standalone Python class with fully descriptive and tested APIs. It also provides native support for Pandas dataframes.

Why? If you're in a Python pipeline, it might be cumbersome to use Java as an intermediate step. Using spmf-wrapper you can stay in your pipeline as though Java is never used at all.

Installation

pip install spmf-wrapper

A Java Runtime Environment is required to run this wrapper. If an existing installation is not detected, JRE v21 is automatically installed using install-jdk python module at $HOME/.jre/jdk-21.0.2+13-jre. If you prefer to install Java Runtime manually, follow instructions here. Test installation by running the following command on the terminal:

> java -version
java version "1.8.0_391"
Java(TM) SE Runtime Environment (build 1.8.0_391-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)

Usage

Example:

from spmf import EMMA

emma = EMMA(min_support=2, max_window=2, timestamp_present=True, transform=True)
output = emma.run_pandas(input_df)

Input:

Time points Itemset
0 1 a
1 2 a
2 3 a
3 3 b
4 6 a
5 7 a
6 7 b
7 8 c
8 9 b
9 11 d

Output:

Frequent episode Support
0 a 5
1 b 3
2 a b 2
3 a-> a 3
4 a -> b 2
5 a -> a b 2

See examples for more details.

For a detailed explanation of the algorithm and parameters, refer to the corresponding webpage in the SPMF documentation.

Implementation Checklist

Sequential Pattern Mining

Algorithm Type Implemented
PrefixSpan Frequent Sequential Pattern ✓
GSP Frequent Sequential Pattern
SPADE Frequent Sequential Pattern ✓
CM-SPADE Frequent Sequential Pattern ✓
SPAM Frequent Sequential Pattern ✓
CM-SPAM Frequent Sequential Pattern
FAST Frequent Sequential Pattern
LAPIN Frequent Sequential Pattern
ClaSP Frequent Closed Sequential Pattern ✓
CM-ClaSP Frequent Closed Sequential Pattern ✓
CloFAST Frequent Closed Sequential Pattern
CloSpan Frequent Closed Sequential Pattern
BIDE+ Frequent Closed Sequential Pattern
Post Processing SPAM or PrefixSpan Frequent Closed Sequential Pattern
MaxSP Frequent Maximal Sequential Pattern
VMSP Frequent Maximal Sequential Pattern ✓
FEAT Frequent Sequential Generator Pattern
FSGP Frequent Sequential Generator Pattern
VGEN Frequent Sequential Generator Pattern ✓
NOSEP Non-overlapping Sequential Pattern ✓
GoKrimp Compressing Sequential Pattern
TKS Top-k Frequent Sequential Pattern ✓
TSP Top-k Frequent Sequential Pattern

Episode Mining

Algorithm Type Implemented
EMMA Frequent Episode ✓
AFEM Frequent Episode ✓
MINEPI Frequent Episode
MINEPI+ Frequent Episode ✓
TKE Top-k Frequent Episodes ✓
MaxFEM Maximal Frequent Episodes ✓
POERM Episode Rules
POERM-ALL Episode Rules
POERMH Episode Rules
NONEPI Episode Rules ✓
TKE-Rules Episode Rules ✓
AFEM-Rules Episode Rules ✓
EMMA-Rules Epsiode Rules ✓
MINEPI+-Rules Episode Rules
HUE-SPAN High Utility Episodes
US-SPAN High Utility Episodes
TUP Top-K High Utility Episodes

Bibliography

Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016).
The SPMF Open-Source Data Mining Library Version 2.
Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853,  pp. 36-40.