info-gain

Information gain utilities


License
MIT
Install
pip install info-gain==1.0.1

Documentation

info_gain

Implementation of information gain algorithm. There seems to be a debate about how the information gain metric is defined. Whether to use the Kullback-Leibler divergence or the Mutual information as an algorithm to define information gain. This implementation uses the information gain calculation as defined below:

Information gain definitions

Information gain calculation

Definition from information gain calculation (retrieved 2018-07-13). Let Attr be the set of all attributes and Ex the set of all training examples, value(x, a) with x in Ex defines the value of a specific example x for attribute a in Attr, H specifies the entropy. The values(a) function denotes the set of all possible values of attribute a in Attr. The information gain for an attribute a in Attr is defined as follows:

Information gain formula

Intrinsic value calculation

Definition from information gain calculation (retrieved 2018-07-13).

Intrinsic value calculation

Information gain ratio calculation

Definition from information gain calculation (retrieved 2018-07-13).

Intrinsic value calculation

Installation

To install the package via pip use:

pip install info_gain

To clone the package from the git repository use:

git clone https://github.com/Thijsvanede/info_gain.git

Usage

Import the info_gain module with:

from info_gain import info_gain

The imported module has supports three methods:

  • info_gain.info_gain(Ex, a) to compute the information gain.
  • info_gain.intrinsic_value(Ex, a) to compute the intrinsic value.
  • info_gain.info_gain_ratio(Ex, a) to compute the information gain ratio.

Example

from info_gain import info_gain

# Example of color to indicate whether something is fruit or vegatable
produce = ['apple', 'apple', 'apple', 'strawberry', 'eggplant']
fruit   = [ True  ,  True  ,  True  ,  True       ,  False    ]
colour  = ['green', 'green', 'red'  , 'red'       , 'purple'  ]

ig  = info_gain.info_gain(fruit, colour)
iv  = info_gain.intrinsic_value(fruit, colour)
igr = info_gain.info_gain_ratio(fruit, colour)

print(ig, iv, igr)