pyMILES

Multiple instance learning via embedded instance selection


License
MIT
Install
pip install pyMILES==0.0.6

Documentation

Tests coverage

Multiple instance learning via embedded instance selection

This python package is an implementation of MILES: Multiple-instance learning via embedded instance selection from IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 12, DECEMBER 2006.

The paper describes a method to encode bag-space features into a space defined by the most-likely-cause-estimator of the bag and training feature space.

The most likely cause estimator is defined as Most Likely Estimator

An example encoding

Look at embedding_test.py for an example embedding of dummy data. Dummy data is created from 5 normal distributions, and each instance is generated by one of the following two-dimensional probability distributions:

N1([5,5]^T, I), -> The normal distribution with mean [5,5] and 1 unit standard deviation N2([5,-5]^T, I), N3([-5,5]^T, I), N4([-5,-5]^T, I), N5([0,0]^T, I)

Bags are created from a variable number of instances per bag, and this example uses 8. A bag is labeled positive if it contains instances from at least two different distributions among N1, N2, and N3. Otherwise the bag is negative. This image displays the raw 2-dimensional data #2-D Raw Data

A single bag is of shape (N_INSTANCES, FEATURE_SPACE) where n is the number of instances in a bag, and p is the feature space of the instances.

All positive bags are of shape (N_POSITIVE_BAGS, N_INSTANCES, FEATURE_SPACE) where N_POSITIVE_BAGS is the number of positive bags. Negative bags are of shape (N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE). The total set of training instances is of shape (N_POSITIVE_BAGS + N_NEGATIVE_BAGS, N_INSTANCES, FEATURE_SPACE).

A single bag is embedded into a vector of shape ((N_POSITIVE_BAGS + N_NEGATIVE_BAGS) * N_INSTANCES), which is the total number of instances from all positive and negative bags.

In this example let When projecting the training instances onto the vectors

# Feature vectors close to mean of `true` positive distributions
x1 = np.array([4.3, 5.2])
x2 = np.array([5.4, -3.9])
x3 = np.array([-6.0, 4.8])

the result is a (3,40) matrix which is visualized below. #Linearly Separable Bags

Testing

  • python -m unittest tests.embedding_test
  • python -m unittest tests.l1_svm_test

Code coverage and linting

  • pylint -r n src/tests/ src/pyMILES
  • From src directory: coverage run -m unittest tests.embedding_test
  • autopep8 --recursive --in-place src/tests/ src/pyMILES/

Building

Increment build version in setup.cfg python -m build . python -m twine upload dist/*