mosaiq

A python library for simple mosaiq/mosaic style plots.


License
MIT
Install
pip install mosaiq==0.1

Documentation

Mosaiq for Python

This is a simplified mosaic plot technique that works for numeric/categorical data.

Imgur

For categorical data, a frequency table of values is calculated. Only the top 7 most common categories are preserved. The rest are replaced by "NA_TOPN".

For numeric data, a histogram is calculated over the distribution. The precise numeric values are replaced by its respective bin.

Call it with the following arguments:

  1. A dataframe
  2. The name of a "feature" column
  3. The name of a "target" column
  4. Whether the color ramp should be inverted (default : False)
  5. A colormap (default : derived from target column)
  6. The number of categories to preserve in categorical data (default : 7)
# dat (pandas dataframe)
# feature (feature name string)
# target (target name string)
mosaiq(dat, feature, target)

Using this visualization makes it easy to iterate through all feature/target interactions in a given dataset:

for col in dat.columns:
    if col == target:
        continue # skip the plot if the column is the target

    mosaiq(mdat, col, target)
    plt.show()