A python3 bokeh based boolean data, categorical data, numerical data, dendrogram, and heatmap plotting library.
pip install bokehheat==0.0.7
Bokehheat provides a python3, bokeh based, interactive boolean data, categorical data, numerical data, dendrogram, and heatmap plotting implementation.
Available bokehheat plots are:
heat.cdendro: an interactive categorical dendrogram plot implementation.
heat.bbar: an interactive boolean bar plot implementation.
heat.cbar: an interactive categorical bar plot implementation.
heat.qbar: an interactive quantitative bar plot implementation.
heat.heatmap: an interactive heatmap implementation.
heat.clustermap: an interactive cluster heatmap implementation which combines heat.cdendro, heat.bbar, heat.cbar, heat.qbar and heat.heatmap under the hood.
jheat.jdendro: javatreeview compatible dendogram gtr, atr file output.
jheat.jheatmap: javatreeview compatible heatmap cdt file output.
jheat.jclustermap: javatreeview compatible heatmap cdt, gtr and atr file output, which runs jheat.jdendro and jheat.jheatmap under the hood.
For the real interactive experience please clone or download this repository and open theclustermap_0.0.0.html file with your favorite web browser (we recommend FireFox) or install bokehheat and run this tutorial.
Figure: This is a poor, static heat.clustermap png version.
How to install bokehheat?
pip3 install bokehheat
How to load the bokehheat library?
from bokehheat import heat
How to get reference information about how to use each bokehheat module?
from bokehheat import heat
help(heat.cdendro)
help(heat.bbar)
help(heat.cbar)
help(heat.qbar)
help(heat.heatmap)
help(heat.clustermap)
How to get reference information about how to use each javatreeview compatible module?
from bokehheat import jheat
help(jheat.jdendro)
help(jheat.jheatmap)
help(jheat.jclustermap)
How to integrate bokehheat plots into pweave documents?
from pweave.bokeh import output_pweave, show
output_pweave()
o_clustermap, ls_xaxis, ls_yaxis = heat.clustermap(...)
show(o_clustermap)
How to integrate bokehheat plots into Jupyter Notebook and Lab?
Please, have a look at this page from the official bokeh documentaion.
This tutorial guides you through a cluster heatmap generation process.
Load libraries needed for this tutorial:
# library
from bokehheat import heat, jheat
from bokeh.io import show
from bokeh.palettes import Reds9, RdBu11, YlGn8, Colorblind8
import numpy as np
import pandas as pd
Prepare data:
# generate test data
ls_sample = ['sampleA','sampleB','sampleC','sampleD','sampleE','sampleF','sampleG','sampleH']
ls_variable = ['geneA','geneB','geneC','geneD','geneE','geneF','geneG','geneH', 'geneI']
ar_z = np.random.rand(9,8)
df_matrix = pd.DataFrame(ar_z)
df_matrix.index = ls_variable
df_matrix.columns = ls_sample
df_matrix.index.name = 'y'
df_matrix.columns.name = 'x'
# generate some gene annotation
df_variable = pd.DataFrame({
'y': ls_variable,
'genereal': list(np.random.random(9) * 2 - 1),
'genetype': ['Ligand','Ligand','Ligand','Ligand','Ligand','Ligand','Receptor','Receptor','Receptor'],
'genetype_color': ['Cyan','Cyan','Cyan','Cyan','Cyan','Cyan','Cornflowerblue','Cornflowerblue','Cornflowerblue'],
'geneboole': [False, False, False, True, True, True, False, False, False],
})
df_variable.index = df_variable.y
# generate some sample annotation
df_sample = pd.DataFrame({
'x': ls_sample,
'age_year': list(np.random.randint(0,101, 8)),
'sampletype': ['LumA','LumA','LumA','LumB','LumB','Basal','Basal','Basal'],
'sampletype_color': ['Purple','Purple','Purple','Magenta','Magenta','Orange','Orange','Orange'],
'sampleboole': [False, False, True, True, True, True, False, False],
})
df_sample.index = df_sample.x
Generate categorical and quantitative sample and gene annotation tuple of tuples:
t_yboole = (df_variable,['geneboole'],'Red','Maroon') # True, False
t_ycat = (df_variable, ['genetype'], ['genetype_color'])
t_yquant = (df_variable, ['genereal'], [-1], [1], [Colorblind8][::-1])
t_xboole = (df_sample,['sampleboole'],'Red','Maroon') # True, False
t_xcat = (df_sample, ['sampletype'], ['sampletype_color'])
t_xquant = (df_sample, ['age_year'], [0], [128], [YlGn8][::-1])
tt_boolecatquant = (t_yboole, t_ycat, t_yquant, t_xboole, t_xcat, t_xquant)
Generate the cluster heatmap:
s_file = "theclustermap.html" # or "theclustermap.png"
o_clustermap, ls_xaxis, ls_yaxis = heat.clustermap(
df_matrix = df_matrix,
ls_color_palette = Reds9,
r_low = 0,
r_high = 1,
s_z = "log2",
tt_axis_annot = tt_boolecatquant,
b_ydendo = True,
b_xdendo = True,
#s_method='average',
#s_metric='euclidean',
#b_optimal_ordering=True,
#i_px = 64,
#i_height = 12,
#i_width = 12,
#i_min_border_px = 128,
s_filename=s_file,
s_filetitel="the Clustermap",
)
Display the result:
print(f"check out: {s_file}")
print(f"y axis is: {ls_yaxis}")
print(f"x axis is: {ls_xaxis}")
show(o_clustermap)
The resulting clustermap should look something like the example result in the section above.
t_out = jheat.jclustermap(
df_matrix=df_matrix,
tt_axis_annot = tt_boolecatquant,
s_xcolor = "age_year",
s_ycolor = "genetype",
b_xdendo = True,
b_ydendo = True,
#s_method = 'average',
#s_metric = 'euclidean',
#b_optimal_ordering = True,
s_filename = "jclustermap",
)
print(t_out)
In bioinformatics a clustered heatmap is a common plot to present gene expression data from many patient samples. There are well established open source clustering software kits like Cluster and TreeView, JavaTreeView, and TreeView3 for producing and investigating such heatmaps.
There exist a wealth of R and R/bioconductor packages with static cluster heatmaps functions (e.g. heatmap.2 from the gplots library), each one with his own pros and cons.
In Python the static cluster heatmap landscape looks much more deserted. There are some ancient mathplotlib based implementations like this active state recipe or the heatmapcluster library, or the hclustering library. There is the seaborn clustermap implementation, which looks good but might need hours of tweaking to get an agreeable plot with all the needed information out.
So, static heatmaps are not really a tool for exploring data.
There exist d3heatmap a R/d3.js based interactive cluster heatmap packages. And heatmaply, a R/plotly based package. Or on a more basic level R/plotly based cluster heatmaps can be written with the ggdendro and ggplot2 library.
But I have not found a full fledged python based interactive cluster heatmap library. Neither Python/plottly nor Python/bokeh based. The only Python/bokeh based cluster heatmap implementation I was really aware of was this listing from Daniel Russo. Later on I found this bokeh based bkheatmap implementation from Wen-Wei Liao.
All in all, all of these implementations were not really what I was looking for. That is why I rolled my own. Bokehheat is a Python3/bokeh based interactive cluster heatmap library.
The challenges this implementation tried to solve are, the library should be:
If you are interested in data visualization, check out Jake VanderPlas talk Python Visualization Landscape from the PyCon 2017 in Portland Oregon (USA).