knor

Non-Uniform Memory Access ('NUMA') Optimized, Parallel K-Means


Keywords
clustering-algorithm, machine-learning, parallel-computing
License
Apache-2.0

Documentation

Build Status CRAN_Status_Badge Downloads

clusternor (clustering NUMA optimized routines library for clustering)

Repo contents

  • R: R building blocks for user interface code. Internally called by user interface.
  • data: Data files for testing.
  • inst: Citation files
  • man: Package documentation
  • src: R bindings interface and C++ submodule to base repo.
  • tests: R unit tests written using the testthat package.

R bindings for Clustering NUMA optimized routines. This package is supported for Linux, Mac OSX and Windows.

NOTE: This is a package from C++ source that will compile using your gcc compiler.

Tested on

  • Mac OSX: 10.11 (El Capitan), 10.12 (Sierra), 10.13 (High Sierra), 10.14 (Mojave)
  • Linux: Ubuntu 14.04, 16.04, 18.04, CentOS 6, Fedora 25, Fedora 26
  • Windows: 8.1, 10

Hardware requirements

  • Any machine with >= 2 GB RAM

License

This software is licensed under the Apache version 2.0 license.

Best Performance configuration

For the best performance on Linux make sure the numa system package is installed via

apt-get install -y build-essential libnuma-dbg libnuma-dev libnuma1

R Dependencies

  • We require a recent version of Rcpp (install.packages("Rcpp"))
  • We recommend the testthat package if you want to run unit-tests (install.packages("testthat"))

Stable builds

Install from CRAN directly. Installation time is normally ~2min.

install.packages("clusternor")

Bleeding edge install

Install directly from Github. This has dependency on the following system packages:

  • git
  • autoconf
git clone --recursive https://github.com/flashxio/knorR.git
cd knorR
./install.sh

Mac: Install via brew install autoconf

Ubuntu: Install via apt-get install autoconf

NOTE: The command may require administrator privileges (i.e., sudo)

Docker

A Docker images with all dependencies installed can be obtained by:

docker pull flashxio/knorr-base

NOTE: The clusternor R package must still be installed on this image via: install.packages("clusternor")

If you prefer to build the image yourself, you can use this Dockerfile

Examples:

Work with data already in-memory

iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
kms <- Kmeans(iris.mat, k)

Work with data from disk

To work with data from disk simply use binary row-major data. Please see this link for a detailed description.

fn <- "/path/to/file.bin" # Use real file
k <- 2 # The number of clusters
nrow <- 50 # The number of rows
ncol <- 5 # The number of columns
kms <-Kmeans(fn, nrow, ncol, k, init="kmeanspp", nthread=2)

Test data

We provide test data that is included as part of the package and can be accessed directly via this link or through the R interpreter after the package is required in R as clusternor::test_data.

Reproduction and Verification

require(clusternor)
kms <- Kmeans(test_data, test_centroids)

Expected output:

Runtime for this action should be nearly instantaneous on any machine:

> kms
$nrow
[1] 50

$ncol
[1] 5

$iters
[1] 5

$k
[1] 8

$centers
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 2.881889 4.079735 4.243061 1.953790 2.690649
[2,] 2.494522 2.334093 2.204031 4.161763 2.444349
[3,] 3.630086 2.398294 3.793616 2.404824 4.490043
[4,] 3.909759 3.991190 2.947161 3.762090 1.950588
[5,] 4.574327 3.645658 3.975175 4.505870 3.595890
[6,] 3.190091 4.267428 1.643788 3.229366 3.700539
[7,] 2.110254 3.147714 2.153235 1.581510 3.102312
[8,] 2.186852 2.027695 3.938736 1.410910 2.383727

$cluster
 [1] 3 2 3 3 6 8 8 3 3 2 3 4 7 7 5 4 2 1 2 1 2 7 7 5 1 1 8 7 5 2 6 2 4 6 6 8 2 5
[39] 7 4 6 5 6 4 7 4 5 4 2 5

$size
[1] 4 9 6 7 7 6 7 4

Help

Please refere to the docs provided:

?clusternor::Kmeans
?clusternor::Skmeans
?clusternor::KmeansPP
?clusternor::Hmeans
?clusternor::Xmeans
?clusternor::Gmeans
?clusternor::MiniBatchKmeans
?clusternor::FuzzyCMeans
?clusternor::Kmedoids