Bayesian Histogram-based Anomaly Detection


Keywords
bayesian-inference, anomaly-detection, unsupervised-learning, explainability, machine-learning, scikit-learn, unsupervised-machine-learning
License
MIT
Install
pip install bhad==0.1.0

Documentation

Bayesian Histogram-based Anomaly Detection (BHAD)

Python implementation of the BHAD algorithm as presented in Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms. The bhad package follows Scikit-learn's standard API for outlier detection.

Installation

pip install bhad

Usage

1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (optionally).

2.) Train the model using discrete data.

For convenience these two steps can be wrapped up via a scikit-learn pipeline (optionally).

from bhad.model import BHAD
from bhad.utils import Discretize
from sklearn.pipeline import Pipeline

num_cols = [....]   # names of numeric features
cat_cols = [....]   # categorical features

pipe = Pipeline(steps=[
   ('discrete', Discretize(nbins = None)),   
   ('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))
])

For a given dataset get binary model decisons:

y_pred = pipe.fit_predict(X = dataset)        

Get global model explanation as well as for individual observations:

from bhad.explainer import Explainer

local_expl = Explainer(pipe.named_steps['model'], pipe.named_steps['discrete']).fit()

local_expl.get_explanation(nof_feat_expl = 5, append = False)   # individual explanations

local_expl.global_feat_imp                                      # global explanation

A detailed toy example using synthetic data for anomaly detection can be found here and an example using the Titanic dataset illustrating model explanability can be found here.