Python Wrapper for SPMF Java library.
This module contains python wrappers for pattern mining algorithms implemented in SPMF Java library. Each algorithm is implemented as a standalone Python class with fully descriptive and tested APIs. It also provides native support for Pandas dataframes.
Why? If you're in a Python pipeline, it might be cumbersome to use Java as an intermediate step. Using spmf-wrapper
you can stay in your pipeline as though Java is never used at all.
A Java Runtime Environment is required to run this wrapper. If an existing installation is not detected, JRE v21 is automatically installed using install-jdk
python module at $HOME/.jre/jdk-21.0.2+13-jre
. If you prefer to install Java Runtime manually, follow instructions here
. Test installation by running the following command on the terminal:
> java -version
java version "1.8.0_391"
Java(TM) SE Runtime Environment (build 1.8.0_391-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)
Example:
from spmf import EMMA
emma = EMMA(min_support=2, max_window=2, timestamp_present=True, transform=True)
output = emma.run_pandas(input_df)
Input:
Time points | Itemset | |
---|---|---|
0 | 1 | a |
1 | 2 | a |
2 | 3 | a |
3 | 3 | b |
4 | 6 | a |
5 | 7 | a |
6 | 7 | b |
7 | 8 | c |
8 | 9 | b |
9 | 11 | d |
Output:
Frequent episode | Support | |
---|---|---|
0 | a | 5 |
1 | b | 3 |
2 | a b | 2 |
3 | a-> a | 3 |
4 | a -> b | 2 |
5 | a -> a b | 2 |
See examples for more details.
For a detailed explanation of the algorithm and parameters, refer to the corresponding webpage in the SPMF documentation.
Algorithm | Type | Implemented |
---|---|---|
PrefixSpan | Frequent Sequential Pattern | ✓ |
GSP | Frequent Sequential Pattern | |
SPADE | Frequent Sequential Pattern | ✓ |
CM-SPADE | Frequent Sequential Pattern | ✓ |
SPAM | Frequent Sequential Pattern | ✓ |
CM-SPAM | Frequent Sequential Pattern | |
FAST | Frequent Sequential Pattern | |
LAPIN | Frequent Sequential Pattern | |
ClaSP | Frequent Closed Sequential Pattern | ✓ |
CM-ClaSP | Frequent Closed Sequential Pattern | ✓ |
CloFAST | Frequent Closed Sequential Pattern | |
CloSpan | Frequent Closed Sequential Pattern | |
BIDE+ | Frequent Closed Sequential Pattern | |
Post Processing SPAM or PrefixSpan | Frequent Closed Sequential Pattern | |
MaxSP | Frequent Maximal Sequential Pattern | |
VMSP | Frequent Maximal Sequential Pattern | ✓ |
FEAT | Frequent Sequential Generator Pattern | |
FSGP | Frequent Sequential Generator Pattern | |
VGEN | Frequent Sequential Generator Pattern | ✓ |
NOSEP | Non-overlapping Sequential Pattern | ✓ |
GoKrimp | Compressing Sequential Pattern | |
TKS | Top-k Frequent Sequential Pattern | ✓ |
TSP | Top-k Frequent Sequential Pattern |
Algorithm | Type | Implemented |
---|---|---|
EMMA | Frequent Episode | ✓ |
AFEM | Frequent Episode | ✓ |
MINEPI | Frequent Episode | |
MINEPI+ | Frequent Episode | ✓ |
TKE | Top-k Frequent Episodes | ✓ |
MaxFEM | Maximal Frequent Episodes | ✓ |
POERM | Episode Rules | |
POERM-ALL | Episode Rules | |
POERMH | Episode Rules | |
NONEPI | Episode Rules | ✓ |
TKE-Rules | Episode Rules | ✓ |
AFEM-Rules | Episode Rules | ✓ |
EMMA-Rules | Epsiode Rules | ✓ |
MINEPI+-Rules | Episode Rules | |
HUE-SPAN | High Utility Episodes | |
US-SPAN | High Utility Episodes | |
TUP | Top-K High Utility Episodes |
Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016).
The SPMF Open-Source Data Mining Library Version 2.
Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853, pp. 36-40.