KAVICA: Powerful Python Cluster Analysis and Inference Toolkit


Keywords
Cluster, Inference, System, Feature, Selection, Factor, Analysis, Parser, Clustering, Unsupervised, Self-organizing, map, Organization, Component, Space, Curvature, Multiline, Transformation
License
BSD-3-Clause
Install
pip install KAVICA==1.3.4

Documentation



KAVICA: Powerful Python Cluster Analysis and Inference Toolkit

PyPI Latest Release Conda Latest Release Package Status License Downloads Downloads Stack Overflow

What is it?

kavica is a Python package that provides semi-automated, flexible, and expressive clustering analysis designed to make working with "unlabeled" data easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world cluster analysis in Python. Additionally, it has the broader goal of becoming A powerful and flexible open source AutoML unsupervised / clustering analysis tool and pipeline. It is already well on its way towards this goal.

Main Features

Here are just a few of the things that kavica does well:

  • Intelligent Density Maping to model the density structuer of the data in analogy to Einstein's theory of relativity, and automated Density Homogenizing to prepare the data for the density-based clustering (e.g DBSCAN)

  • Automatic, and powerful Organization Component Analysis to interpret the clustering result by understanding the topological structuer of each cluster

  • Topological and powerful Self-Organizing Maps Inference System to use the self-learning ability of the SOM to understand the topological structuer of the data

  • Automated and Bayesian-based DBSCAN Hyper-parameter Tuner to select the optimal hyper-parameters configuration of the DBSCAN clustering algorithm

  • Efficient handling of feature selection in a potentially high-dimensional and massive datasets

  • Gravitational implementation of Kohonen Generational Self-Organizing Maps ( GSOM) useful for unsupervised learning and supper-clustering by providing an enriched graphics, plots and animations features.

  • Computational geometrical model Polygonal Cage to transfer feature vectors from a curved non-euclidean feature space to a new euclidean one.

  • Robust factor analysis to reduce a large number of variables into fewer numbers

  • Easy handling of missing data (represented as NaN, NA, or NaT) in floating point as well as non-floating point data

  • Flexible implementation of directed and undirected graph data structuer and algorithms.

  • Intuitive resampling data sets

  • Powerful, flexible parser functionality to perform parsing, manipulating, and generating operations on flat, massive and unstructured Traces datasets which are generated by MareNostrum

  • Utilities functionality: intuitive explanatory data analysis, plotting, load and generate data, and etc...

Examples:

  • Feature Space Curvature Map


  • Density Homogenizing

    Application of Feature Space Curvature Map on a multi-density 2D dataset Synt10 containing ten clusters. (a) A scatter plot of clusters with varied densities. The legend shows the size/N(μ,σ2) per cluster, the colors represent the data original labeling and the red lines draw the initial FSF. (b) shows the FSC model that is computed with our FSCM method. Note that the red lines show the deformation of the FSF. (c) scatter plots the data (a) projected by applying our transformation through model (b). As a result, the diversity of the clusters’ density scaled appropriately to achieve a better density-based clustering performance.

  • Polygonal Cage Multilinear transformation

    Feature Space Curvetuer Feature Space Fabric

Data point transformation between a bent FSC (a) and regular FSF (b) based on the Multi-linear transformation in R2.

Video

Where to get it

The source code is currently hosted on GitHub at: kavica

Binary installers for the latest released version are available at the Python Package Index (PyPI) and on Conda.

The recommended way to install kavica is to use:

# PyPI
pip install kavica

But it can also be installed using:

# or conda
conda config --add channels conda-forge
conda install kavica

To verify your setup, start Python from the command line and run the following:

import kavica

Dependencies

See the requirement.txt for installing the required packages:

pip install -r requirements.txt

Publications

Unsupervised Feature Selection for Noisy Data

Organization Component Analysis: The method for extracting insights from the shape of cluster

Feature Space Curvature Map: A Method To Homogenize Cluster Densities

Issue tracker

If you find a bug, please help us solve it by filing a report.

Contributing

If you want to contribute, check out the contribution guidelines.

License

The main library of kavica is released under the BSD 3 clause license.