mixdir

The goal of mixdir is to cluster high dimensional categorical datasets.

It can

handle missing data
infer a reasonable number of latent class (try mixdir(select_latent=TRUE))
cluster datasets with more than 70,000 observations and 60 features
propagate uncertainty and produce a soft clustering

Installation

devtools::install_github("const-ae/mixdir")

Example

Clustering the mushroom data set.

# Loading the library and the data
library(mixdir)
set.seed(1)

data("mushroom")
# High dimensional dataset: 8124 mushroom and 23 different features
mushroom[1:10, 1:5]
#>    bruises cap-color cap-shape cap-surface    edible
#> 1  bruises     brown    convex      smooth poisonous
#> 2  bruises    yellow    convex      smooth    edible
#> 3  bruises     white      bell      smooth    edible
#> 4  bruises     white    convex       scaly poisonous
#> 5       no      gray    convex      smooth    edible
#> 6  bruises    yellow    convex       scaly    edible
#> 7  bruises     white      bell      smooth    edible
#> 8  bruises     white      bell       scaly    edible
#> 9  bruises     white    convex       scaly poisonous
#> 10 bruises    yellow      bell      smooth    edible

Calling the clustering function mixdir on a subset of the data:

# Clustering into 3 latent classes
result <- mixdir(mushroom[1:1000, 1:5], n_latent=3)

Analyzing the result

# Latent class of of first 10 mushrooms
head(result$pred_class, n=10)
#>  [1] 3 1 1 3 2 1 1 1 3 1

# Soft Clustering for first 10 mushrooms
head(result$class_prob, n=10)
#>               [,1]         [,2]         [,3]
#>  [1,] 2.781141e-14 2.915757e-09 1.000000e+00
#>  [2,] 1.000000e+00 1.286473e-09 4.022864e-08
#>  [3,] 1.000000e+00 8.483274e-10 3.020595e-08
#>  [4,] 1.734041e-07 1.173963e-11 9.999998e-01
#>  [5,] 4.078317e-14 1.000000e+00 4.873989e-14
#>  [6,] 9.999999e-01 1.450405e-11 1.053419e-07
#>  [7,] 1.000000e+00 8.483274e-10 3.020595e-08
#>  [8,] 9.999999e-01 9.564273e-12 7.909668e-08
#>  [9,] 1.734041e-07 1.173963e-11 9.999998e-01
#> [10,] 1.000000e+00 4.799899e-16 8.694462e-15
pheatmap::pheatmap(result$class_prob, cluster_cols=FALSE,
                  labels_col = paste("Class", 1:3))

# Structure of latent class 1
# (bruises, cap color either yellow or white, edible etc.)
purrr::map(result$category_prob, 1)
#> $bruises
#>      bruises           no 
#> 0.9996447887 0.0003552113 
#> 
#> $`cap-color`
#>        brown         gray          red        white       yellow 
#> 0.0003548468 0.0003592167 0.0003548906 0.4077977750 0.5911332709 
#> 
#> $`cap-shape`
#>         bell       convex         flat       sunken 
#> 0.3925723193 0.4765681327 0.1305046002 0.0003549479 
#> 
#> $`cap-surface`
#>    fibrous      scaly     smooth 
#> 0.05700435 0.48705770 0.45593795 
#> 
#> $edible
#>       edible    poisonous 
#> 0.9996447805 0.0003552195

# The most predicitive features for each class
find_representative_answers(result$lambda, result$category_prob, top_n=3)
#>       column    answer class probability
#> 19 cap-color    yellow     1   0.9988010
#> 22 cap-shape      bell     1   0.9981940
#> 1    bruises   bruises     1   0.7089829
#> 48    edible poisonous     3   0.9961025
#> 15 cap-color       red     3   0.7498975
#> 9  cap-color     brown     3   0.6468097
#> 5    bruises        no     2   0.9980757
#> 11 cap-color      gray     2   0.9957144
#> 32 cap-shape    sunken     2   0.9873503
# For example: if all I know about a mushroom is that it has a
# yellow cap, then I am 99% certain that it will be in class 1
predict_class(c(`cap-color`="yellow"), result$lambda, result$category_prob)
#> [1] 0.9988009540 0.0005996516 0.0005993944

# Convergence
plot(result$convergence, main=paste0("ELBO: ", formatC(result$ELBO, digits = 3)))

Underlying Model

The package implements a variational inference algorithm to solve a Bayesian latent class model (LCM).

Disclaimer

This package is still under development and can still change profoundly.

mixdir
Release 0.3.0

Release 0.3.0

0.3.0

0.2.0

0.1.0

Documentation

mixdir

Installation

Example

Underlying Model

Disclaimer

Stats

Development practices

Releases

Contributors

mixdir Release 0.3.0

Release 0.3.0 Toggle Dropdown 0.3.0 0.2.0 0.1.0

Documentation

mixdir

Installation

Example

Underlying Model

Disclaimer

Stats

Development practices

Releases

Contributors

mixdir
Release 0.3.0

Release 0.3.0

0.3.0

0.2.0

0.1.0