fca-lazy-clf

Lazy binary classifier based on Formal Concept Analysis


Keywords
fca, formal-concept-analysis, lazy-learning, binary-classification
License
MIT
Install
pip install fca-lazy-clf==0.3

Documentation

Lazy binary classifier based on Formal Concept Analysis

Usually, the work of the classifier can be divided into two steps: the selection of patterns in the training sample (training) and their use in the classification. The lazy classification method differs in that the first step is skipped, and the second step uses the entire training sample, which takes much longer, but can improve the accuracy of the classification (see report.pdf).

Contents of the repository:

Installation

$ pip install fca_lazy_clf

Requirements

The train and test datasets must be represented as pandas.DataFrame. The classifier uses only attributes of types numpy.dtype('O'), np.dtype('int64') and attributes with 2 any values. Other attributes will not be used. The target attribute must be binary.

Example

>>> import fca_lazy_clf as fca
>>> import pandas as pd
>>> from sklearn import model_selection

>>> data = pd.read_csv('https://datahub.io/machine-learning/tic-tac-toe-endgame/r/tic-tac-toe.csv')
>>> data.head()

   TL TM TR ML MM MR BL BM BR  class
0  x  x  x  x  o  o  x  o  o   True
1  x  x  x  x  o  o  o  x  o   True
2  x  x  x  x  o  o  o  o  x   True
3  x  x  x  x  o  o  o  b  b   True
4  x  x  x  x  o  o  b  o  b   True

>>> X = data.iloc[:, :-1] # All attributes except the last one
>>> y = data.iloc[:, -1] # Last attribute
>>> X_train, X_test, y_train, y_test\
    = model_selection.train_test_split(X, y, test_size=0.33, random_state=0)

>>> clf = fca.LazyClassifier(threshold=0.000001, bias='negative')
>>> clf.fit(X_train, y_train)
>>> clf.score(X_test, y_test)

0.9716088328075709

Parameters of the classifier

  • bias — the decision to make if Support+ is equal to Support−. There are three options: 'positive' (always set a positive class), 'negative' (always set a negative class), and 'random' (set a random class). Read more in the report.pdf.

  • threshold — threshold numeric value from 0 to 1. Read more in the report.pdf.

  • random — True to enable a mode that uses only a randomly selected portion of the training sample, False — to disable the mode.

  • sample_share — if random mode is used, this parameter sets the percentage of entries from the positive and negative set. Valid values in the range from 0 to 1.