MLKernels

Machine learning kernels in Julia.


Keywords
julia, kernel-functions, machine-learning, mercer-kernels
License
MIT

Documentation

Machine Learning Kernels

Documentation Status Build Status Coverage Status

Summary

MLKernels.jl is a Julia package for Mercer kernel functions (or the covariance functions used in Gaussian processes) that are used in the kernel methods of machine learning. This package provides a flexible datatype for representing and constructing machine learning kernels as well as an efficient set of methods to compute or approximate kernel matrices. The package has no dependencies beyond base Julia.

Documentation

Full documentation is available on Read the Docs.

Visualization

Through the use of kernel functions, kernel-based methods may operate in a high (potentially infinite) dimensional implicit feature space without explicitly mapping data from the original feature space to the new feature space. Non-linearly separable data may be linearly separable in the transformed space. For example, the following data set is not linearly separable:

Feature Space

Using a Polynomial Kernel of degree 2, the points are mapped to a 3-dimensional space where a plane can be used to linearly separate the data:

Transformed Data

Explicitly, the Polynomial Kernel of degree 2 maps the data to a cone in 3-dimensional space. The intersecting hyperplane forms a conic section with the cone:

Transformed Data

When translated back to the original feature space, the conic section corresponds to a circle which can be used to perfectly separate the data:

Separating Hyperplane

The above plots were generated using PyPlot.jl. The visualization code is available in visualization.jl.

Getting Started (gettingstarted.jl)

Constructing Kernels

MLKernels.jl comes with a number of pre-defined kernel functions. For example, one of the most popular kernels is the Gaussian kernel (also known as the radial basis kernel or squared exponential covariance function). The code documentation may be used to learn more about the parametric forms of kernels using the ? command and searching for the kernel name:

julia> using MLKernels

help?> GaussianKernel
search: GaussianKernel

  GaussianKernel(α) = exp(-α⋅‖x-y‖²)

The Gaussian Kernel has one scaling parameter α. We may instantiate the kernel using:

julia> GaussianKernel(2.0)
KernelComposition{Float64}(ϕ=Exponential(α=2.0,γ=1.0),κ=SquaredDistance(t=1.0))

The Kernel data type is parametric - any subtype of AbstractFloat, though only Float32 and Float64 are recommended. The default type is Float64.

The Gaussian kernel is actually a specific case of a more general class of kernel. It is composition of scalar function and the squared (Euclidean) distance, SquaredDistanceKernel, which itself is a kernel. The scalar function referenced is referred to as the ExponentialClass, a subtype of the CompositionClass type. Composition classes may be composed with a kernel to yield a new kernel using the ∘ operator (shorthand for KernelComposition):

julia> Ï• = ExponentialClass(2.0, 1.0);

julia> κ = SquaredDistanceKernel(1.0);

julia> ψ = ϕ ∘ κ   # use \circ for '∘'
KernelComposition{Float64}(ϕ=Exponential(α=2.0,γ=1.0),κ=SquaredDistance(t=1.0))

MLKernels.jl implements only symmetric real-valued continuous kernel functions (a subset of the kernels studied in the literature). These kernels fall into two groups:

  • Mercer Kernels (positive definite kernels)
  • Negative Definite Kernels

A negative definite kernels is equivalent to the conditionally positive definite kernels that are often discussed in machine learning literature. A conditionally positive definite kernel is simply the negation of a negative definite kernel.

Returning to the example, the squared distance kernel is not a Mercer kernel although the resulting Gaussian kernel is Mercer. Kernels may be inspected using the ismercer and isnegdef functions:

julia> ismercer(κ)
false

julia> isnegdef(κ)
true

julia> ismercer(ψ)
true

julia> isnegdef(ψ)
false

AdditiveKernel types are available as Automatic Relevance Determination (ARD) Kernels. Weights may be applied to each element-wise function applied to the input vectors. For the scalar product and squared distance kernel, this corresponds to a linear scaling of each of the dimensions.

julia> w = round(rand(5),3);

julia> ARD(κ, w)
ARD{Float64}(κ=SquaredDistance(t=1.0),w=[0.358,0.924,0.034,0.11,0.21])

Kernel Functions

The kernel function may be used to evaluate the kernel function for a pair of scalar values or a pair of vectors for a given kernel:

julia> x = rand(3); y = rand(3);

julia> kernel(ψ, x[1], y[1])
0.9111186233655084

julia> kernel(ψ, x, y)
0.4149629934770782

Given a data matrix X (one observation/input per row), the kernel matrix (Gramian matrix in the implicit feature space) for the set of vectors may be computed using the kernelmatrix function:

julia> X = rand(5,3);

julia> kernelmatrix(ψ, X)
5x5 Array{Float64,2}:
 1.0       0.183858  0.511987  0.438051  0.265336
 0.183858  1.0       0.103226  0.345788  0.193466
 0.511987  0.103226  1.0       0.25511   0.613017
 0.438051  0.345788  0.25511   1.0       0.115458
 0.265336  0.193466  0.613017  0.115458  1.0     

where kernelmatrix(ψ, X)[i,j] is kernel(ψ, vec(X[i,:]), vec(X[j,:])). If the data matrix has one observation/input per column instead, an additional argument is_trans may be supplied to instead operate on the columns:

julia> kernelmatrix(ψ, X', true)
5x5 Array{Float64,2}:
 1.0       0.183858  0.511987  0.438051  0.265336
 0.183858  1.0       0.103226  0.345788  0.193466
 0.511987  0.103226  1.0       0.25511   0.613017
 0.438051  0.345788  0.25511   1.0       0.115458
 0.265336  0.193466  0.613017  0.115458  1.0     

Similarly, given two data matrices, X and Y, the pairwise kernel matrix may be computed using:

julia> kernelmatrix(ψ, X, Y)
5x4 Array{Float64,2}:
 0.805077  0.602521  0.166728  0.503091
 0.110309  0.121443  0.487499  0.711981
 0.692844  0.987933  0.24032   0.404972
 0.505462  0.30802   0.114626  0.641405
 0.248455  0.580033  0.678817  0.423386

julia> kernelmatrix(ψ, X', Y', true)
5x4 Array{Float64,2}:
 0.805077  0.602521  0.166728  0.503091
 0.110309  0.121443  0.487499  0.711981
 0.692844  0.987933  0.24032   0.404972
 0.505462  0.30802   0.114626  0.641405
 0.248455  0.580033  0.678817  0.423386

This is equivalent to the upper right corner of the kernel matrix of [X;Y]:

julia> kernelmatrix(ψ, [X; Y])[1:5, 6:9]
5x4 Array{Float64,2}:
 0.805077  0.602521  0.166728  0.503091
 0.110309  0.121443  0.487499  0.711981
 0.692844  0.987933  0.24032   0.404972
 0.505462  0.30802   0.114626  0.641405
 0.248455  0.580033  0.678817  0.423386

Kernel Operations

All kernels may be translated and scaled by positive real numbers. Scaling or translating a Kernel object by a real number will construct a KernelAffinity object, a wrapper to track scaling and translation constants:

julia> κ1 = GaussianKernel();

julia> 2κ1 + 3
Affine{Float64}(a=2.0,c=3.0,κ=KernelComposition{Float64}(ϕ=Exponential(α=1.0,γ=1.0),κ=SquaredDistance(t=1.0)))

Both Mercer kernels and negative definite kernels are closed under addition. A KernelSum type may be used to sum groups of Mercer kernels or negative definite kernels (though the two types of kernel may not be mixed). The + operator will automatically construct a KernelSum object:

julia> κ2 = PolynomialKernel();

julia> κ1 + κ2
KernelSum{Float64}(KernelComposition(ϕ=Exponential(α=1.0,γ=1.0),κ=SquaredDistance(t=1.0)), KernelComposition(ϕ=Polynomial(a=1.0,c=1.0,d=3),κ=ScalarProduct()))

Mercer kernels are also closed under multiplication. Similarly, a KernelProduct tpye may be used to sum groups of Mercer kernels. The * operator will automatically construct a KernelProduct type:

julia> κ1 * κ2
KernelProduct{Float64}(KernelComposition(ϕ=Exponential(α=1.0,γ=1.0),κ=SquaredDistance(t=1.0)), KernelComposition(ϕ=Polynomial(a=1.0,c=1.0,d=3),κ=ScalarProduct()))

The power ^, exponentiation exp and hyperbolic tangent tanh functions may be applied to certain types of kernels. These act as shortcuts for composition kernel types:

julia> κ3 = SineSquaredKernel();

julia> κ3^0.5
KernelComposition{Float64}(ϕ=Power(a=1.0,c=0.0,γ=0.5),κ=SineSquared(p=3.141592653589793,t=1.0))

julia> κ4 = 2ScalarProductKernel() + 3;

julia> κ4^3
KernelComposition{Float64}(ϕ=Polynomial(a=2.0,c=3.0,d=3),κ=ScalarProduct())

julia> exp(κ4)
KernelComposition{Float64}(ϕ=Exponentiated(a=2.0,c=3.0),κ=ScalarProduct())

julia> tanh(κ4)
KernelComposition{Float64}(ϕ=Sigmoid(a=2.0,c=3.0),κ=ScalarProduct())