
Python Wrapper for SPMF

Python Wrapper for SPMF Java library.


This module contains python wrappers for pattern mining algorithms implemented in SPMF Java library. Each algorithm is implemented as a standalone Python class with fully descriptive and tested APIs. It also provides native support for Pandas dataframes.

Why? If you're in a Python pipeline, it might be cumbersome to use Java as an intermediate step. Using spmf-wrapper you can stay in your pipeline as though Java is never used at all.


pip install spmf-wrapper

A Java Runtime Environment is required to run this wrapper. If an existing installation is not detected, JRE v21 is automatically installed using install-jdk python module at $HOME/.jre/jdk-21.0.2+13-jre. If you prefer to install Java Runtime manually, follow instructions here. Test installation by running the following command on the terminal:

> java -version
java version "1.8.0_391"
Java(TM) SE Runtime Environment (build 1.8.0_391-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.391-b13, mixed mode)



from spmf import EMMA

emma = EMMA(min_support=2, max_window=2, timestamp_present=True, transform=True)
output = emma.run_pandas(input_df)


Time points Itemset
0 1 a
1 2 a
2 3 a
3 3 b
4 6 a
5 7 a
6 7 b
7 8 c
8 9 b
9 11 d


Frequent episode Support
0 a 5
1 b 3
2 a b 2
3 a-> a 3
4 a -> b 2
5 a -> a b 2

See examples for more details.

For a detailed explanation of the algorithm and parameters, refer to the corresponding webpage in the SPMF documentation.

Implementation Checklist

Sequential Pattern Mining

Algorithm Type Implemented
PrefixSpan Frequent Sequential Pattern
GSP Frequent Sequential Pattern
SPADE Frequent Sequential Pattern
CM-SPADE Frequent Sequential Pattern
SPAM Frequent Sequential Pattern
CM-SPAM Frequent Sequential Pattern
FAST Frequent Sequential Pattern
LAPIN Frequent Sequential Pattern
ClaSP Frequent Closed Sequential Pattern
CM-ClaSP Frequent Closed Sequential Pattern
CloFAST Frequent Closed Sequential Pattern
CloSpan Frequent Closed Sequential Pattern
BIDE+ Frequent Closed Sequential Pattern
Post Processing SPAM or PrefixSpan Frequent Closed Sequential Pattern
MaxSP Frequent Maximal Sequential Pattern
VMSP Frequent Maximal Sequential Pattern
FEAT Frequent Sequential Generator Pattern
FSGP Frequent Sequential Generator Pattern
VGEN Frequent Sequential Generator Pattern
NOSEP Non-overlapping Sequential Pattern
GoKrimp Compressing Sequential Pattern
TKS Top-k Frequent Sequential Pattern
TSP Top-k Frequent Sequential Pattern

Episode Mining

Algorithm Type Implemented
EMMA Frequent Episode
AFEM Frequent Episode
MINEPI Frequent Episode
MINEPI+ Frequent Episode
TKE Top-k Frequent Episodes
MaxFEM Maximal Frequent Episodes
POERM Episode Rules
POERM-ALL Episode Rules
POERMH Episode Rules
NONEPI Episode Rules
TKE-Rules Episode Rules
AFEM-Rules Episode Rules
EMMA-Rules Epsiode Rules
MINEPI+-Rules Episode Rules
HUE-SPAN High Utility Episodes
US-SPAN High Utility Episodes
TUP Top-K High Utility Episodes


Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T. (2016).
The SPMF Open-Source Data Mining Library Version 2.
Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853,  pp. 36-40.