MultiVis

The MultiVis package contains the necessary tools for visualisation of multivariate data.

Installation

Dependencies

multivis requires:

Python (==3.11.4)
NumPy (==1.25.2)
OpenPyXL (==2.6.1)
Pandas (==2.1.0)
Matplotlib (==3.8.0)
Seaborn (==0.12.2)
Networkx (==3.1.0)
statsmodels (==0.14.0)
scikits-bootstrap (==1.1.0)
SciPy (==1.11.2)
Scikit-learn (==1.3.1)
tqdm (==4.66.1)
xlrd (==2.0.1)

User installation

The recommend way to install multivis and dependencies is to using conda:

conda install -c brett.chapman multivis

or pip:

pip install multivis

Alternatively, to install directly from github:

pip install https://github.com/brettChapman/multivis/archive/master.zip

API

For further detail on the usage refer to the docstring.

multivis

Edge: Builds nodes and edges and is the base class for the Network class.
- init_parameters
  - [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
  - [datatable] : Pandas dataframe matrix containing scores.
  - [pvalues] : Pandas dataframe matrix containing score/similarity pvalues (if available, otherwise set to None).
- methods
  - [set_params] : Set parameters
    - [filter_type] : The value type to filter the data on (default: 'pvalue')
    - [hard_threshold] : Value to filter the data on (default: 0.005)
    - [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
    - [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
  - [help] : Print this help text
  - [build] : Builds the nodes and edges.
  - [getNodes] : Returns a Pandas dataframe of all nodes.
  - [getEdges] : Returns a Pandas dataframe of all edges.
Network: Builds nodes and edges, with added NetworkX functionality. Inherits from Edge.
- init_parameters
  - [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
  - [datatable] : Pandas dataframe matrix containing scores.
  - [pvalues] : Pandas dataframe matrix containing score/similarity pvalues.
- methods
  - [set_params] : Set parameters
    - [filter_type] : The value type to filter the data on (default: 'pvalue')
    - [hard_threshold] : Value to filter the data on (default: 0.005)
    - [link_type] : The value type to represent links in the network (default: 'score')
    - [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
    - [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
  - [help] : Print this help text
  - [build] : Builds nodes, edges and NetworkX graph.
  - [getNetworkx] : Returns a NetworkX graph.
  - [getLinkType] : Returns the link type parameter used in building the network.
edgeBundle: Produces an interactive hierarchical edge bundle in D3.js, from nodes and edges.
- init_parameters
  - [nodes] : Pandas dataframe containing nodes generated from Edge.
  - [edges] : Pandas dataframe containing edges generated from Edge.
- methods
  - [set_params] : Set parameters
    - [html_file] : Name to save the HTML file as (default: 'hEdgeBundle.html')
    - [innerRadiusOffset] : Sets the inner radius based on the offset value from the canvas width/diameter (default: 120)
    - [blockSeparation] : Value to set the distance between different segmented blocks (default: 1)
    - [linkFadeOpacity] : The link fade opacity when hovering over/clicking nodes (default: 0.05)
    - [mouseOver] : Setting to 'True' swaps from clicking to hovering over nodes to select them (default: True)
    - [fontSize] : The font size in pixels set for each node (default: 10)
    - [backgroundColor] : Set the background colour of the plot (default: 'white')
    - [foregroundColor] : Set the foreground colour of the plot (default: 'black')
    - [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
    - [nodeColorScale] : The scale to use for colouring the nodes ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
    - [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
    - [node_cmap] : Set the CMAP colour palette to use for colouring the nodes (default: 'brg')
    - [edgeColorScale] : The scale to use for colouring the edges, if edge_color_value is 'pvalue' ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
    - [edge_color_value] : Set the values to colour the edges by. Either 'sign', 'score' or 'pvalue' (default: 'score')
    - [edge_cmap] : Set the CMAP colour palette to use for colouring the edges (default: 'brg')
    - [addArcs] : Setting to 'True' adds arcs around the edge bundle for each block (default: False)
    - [arcRadiusOffset] : Sets the arc radius offset from the inner radius (default: 20)
    - [extendArcAngle] : Sets the angle value to add to each end of the arc (default: 2)
    - [arc_cmap] : Set the CMAP colour palette to use for colouring the arcs (default: 'Set1')
  - [help] : Print this help text
  - [build] : Generates the JavaScript embedded HTML code, writes to a HTML file and opens it in a browser.
  - [buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.
plotNetwork: Produces a static spring-embedded network from a NetworkX graph.
- init_parameters
  - [g] : NetworkX graph.
- methods
  - [set_params] : Set parameters
    - [imageFileName] : The image file name to save to (default: 'networkPlot.jpg')
    - [edgeLabels] : Setting to 'True' labels all edges with the score/similarity value (default: True)
    - [saveImage] : Setting to 'True' will save the image to file (default: True)
    - [layout] : Set the NetworkX layout type ('circular', 'kamada_kawai', 'random', 'spring', 'spectral') (default: 'spring')
    - [transparent] : Setting to 'True' will make the background transparent (default: False)
    - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
    - [figSize] : The figure size as a tuple (width,height) (default: (30,20))
    - [node_cmap] : The CMAP colour palette to use for nodes (default: 'brg')
    - [colorScale] : The node colour scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
    - [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
    - [sizeScale] : The node size scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'reverse_linear')
    - [size_range] : The node size scale range to apply. Tuple of length 2. Minimum size to maximum size (default: (150,2000))
    - [sizing_column] : The node sizing column to use (default: sizes all nodes to 1)
    - [alpha] : Node opacity value (default: 0.5)
    - [nodeLabels] : Setting to 'True' will label the nodes (default: True)
    - [fontSize] : The font size set for each node (default: 15)
    - [keepSingletons] : Setting to 'True' will keep any single nodes not connected by edges in the NetworkX graph (default: True)
    - [column] : Column from Peak Table to filter on (default: no filtering)
    - [threshold] : Value to filter on (default: no filtering)
    - [operator] : The comparison operator to use when filtering (default: '>')
    - [sign] : The sign of the score to filter on ('pos', 'neg' or 'both') (default: 'pos')
  - [help] : Print this help text
  - [build] : Generates and displays the NetworkX graph.
springNetwork: Interactive spring-embedded network which inherits data from the NetworkX graph.
- init_parameters
  - [g] : NetworkX graph.
- methods
  - [set_params] : Set parameters
    - [node_size_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") 'range': a number array of length 2 - minimum size to maximum size)) (default: sizes all nodes to 10 with no dropdown menu)
    - [node_color_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: colours all nodes to 'black')
    - [html_file] : Name to save the HTML file as (default: 'springNetwork.html')
    - [backgroundColor] : Set the background colour of the plot (default: 'white')
    - [foregroundColor] : Set the foreground colour of the plot (default: 'black')
    - [chargeStrength] : The charge strength of the spring-embedded network (force between nodes) (default: -120)
    - [groupByBlock] : Setting to 'True' will group nodes by 'Block' if present in the data (default: False)
    - [groupFociStrength] : Set the strength of foci for each group (default: 0.2)
    - [intraGroupStrength] : Set the strength between each group (default: 0.01)
    - [groupLayoutTemplate] : Set the layout template to use for grouping (default: 'treemap')
    - [node_text_size] : The text size for each node (default: 15)
    - [fix_nodes] : Setting to 'True' will fix nodes in place when manually moved (default: False)
    - [displayLabel] : Setting to 'True' will set the node labels to the 'Label' column, otherwise it will set the labels to the 'Name' column from the Peak Table (default: False)
    - [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
    - [link_type] : The link type used in building the network (default: 'score')
    - [link_width] : The width of the links (default: 0.5)
    - [pos_score_color] : Colour value for positive scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'red')
    - [neg_score_color] : Colour value for negative scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'black')
  - [help] : Print this help text
  - [build] : Generates the JavaScript embedded HTML code and writes to a HTML file and opens it in a browser.
  - [buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.
clustermap: Produces a Hierarchical Clustered Heatmap.
- init_parameters
  - [scores] : Pandas dataframe scores.
    - [row_linkage] : Precomputed linkage matrix for the rows from a linkage clustered distance/similarities matrix
    - [col_linkage] : Precomputed linkage matrix for the columns from a linkage clustered distance/similarities matrix
- methods
  - [set_params] : Set parameters
    - [xLabels] : A Pandas Series for labelling the X axis
    - [yLabels] : A Pandas Series for labelling the Y axis
    - [imageFileName] : The image file name to save to (default: 'clusterMap.png')
    - [saveImage] : Setting to 'True' will save the image to file (default: True)
    - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
    - [figSize] : The figure size as a tuple (width,height) (default: (80,70))
    - [dendrogram_ratio_shift] : The ratio to shift the position of the dendrogram in relation to the heatmap (default: 0.0)
    - [dendrogram_line_width] : The line width of the dendrograms (default: 1.5)
    - [background_colour] : Set the background colour (default: 'white')
    - [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
    - [fontSize] : The font size for all text (default: 30)
    - [heatmap_annotation] : Annotate the heatmap with values (default: False)
    - [heatmap_cmap] : The CMAP colour palette to use for the heatmap (default: 'RdYlGn')
    - [cluster_cmap] : The CMAP colour palette to use for the branch separation of clusters in the dendrogram (default: 'Set1')
    - [rowColorCluster] : Setting to 'True' will display a colour bar for the clustered rows (default: False)
    - [colColorCluster] : Setting to 'True' will display a colour bar for the clustered columns (default: False)
    - [row_color_threshold] : The colouring threshold for the row dendrogram (default: 1)
    - [col_color_threshold] : The colouring threshold for the column dendrogram (default: 1)
  - [help] : Print this help text
  - [build] : Generates and displays the Hierarchical Clustered Heatmap (HCH).
plotFeatures: Produces different types of feature plots
- init_parameters
  - [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
  - [datatable] : Pandas dataframe containing matrix of values to plot (N samples x N features). Columns/features must be same as 'Name' from Peak Table.
- methods
  - set_params : Set parameters
    - [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point') - [column_numbers] : The number of columns to display in the plots (default: 4)
      - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [style] : Set the seaborn style (default: 'seaborn-v0_8-white') - [transparent] : Setting to 'True' will make the background transparent (default: False)
      - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12) - [colour_palette] : The colour palette to use for the plot (default: None) - [y_axis_label] : The label to customise the y axis (default: None) - [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
      - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean') - [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5) - [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: [plot_type]_features.png')
      - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
  - [help] : Print this help text
  - [plot] : Generates feature plots.
polarDendrogram: Polar dendrogram
- init_parameters
  - [dn] : Dendrogram dictionary labelled by Peak Table index
- methods
  - set_params : Set parameters
    - [imageFileName] : The image file name to save to (default: 'polarDendrogram.png')
    - [saveImage] : Setting to 'True' will save the image to file (default: True)
    - [branch_scale] : The branch distance scale to apply ('linear', 'log', 'square') (default: 'linear')
    - [gap] : The gap size within the polar dendrogram (default: 0.1)
    - [grid] : Setting to 'True' will overlay a grid (default: False)
    - [style] : Set the seaborn style (default: 'seaborn-v0_8-white')
    - [transparent] : Setting to 'True' will make the background of all plots transparent (default: False)
    - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
    - [figSize] : The figure size as a tuple (width,height) (default: (10,10))
    - [fontSize] : The font size for all text (default: 15)
    - [PeakTable] : The Peak Table Pandas dataframe (default: empty dataframe)
    - [DataTable] : The Data Table Pandas dataframe (default: empty dataframe)
    - [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
    - [textColorScale] : The scale to use for colouring the text ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
    - [text_color_column] : The colour column to use from Peak Table (Can be colour or numerical values such as 'pvalue') (default: 'black')
    - [label_column] : The label column to use from Peak Table (default: use original Peak Table index from cartesian dendrogram)
    - [text_cmap] : The CMAP colour palette to use (default: 'brg')
  - [plotClusters] : Aggregates peaks from each cluster of the polar dendrogram and generates different feature plots across the group/class variables.
    - [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point')
    - [column_numbers] : The number of columns to display in the plots (default: 4) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centres to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12)
    - [colour_palette] : The colour palette to use for the plot (default: None)
    - [y_axis_label] : The label to customise the y axis (default: None)
    - [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean')
    - [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5)
    - [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: '[plot_type]_clusterPlots.png') - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
  - [help] : Print this help text
  - [build] : Generates and displays the Polar dendrogram.
pca: Creates a Principal Component Analysis (PCA) scores and loadings biplot.
- parameters
  - [data] : array-like matrix, shape (n_samples, n_features)
  - [imageFileName] : The image file name to save to (default: 'PCA.png')
  - [saveImage] : Setting to 'True' will save the image to file (default: True)
  - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
  - [pcx] : The first component (default: 1)
  - [pcy] : The second component (default: 2)
  - [group_label] : Labels to assign to each group/class in the PCA plot (default: None)
  - [sample_label] : Labels to assign to each sample in the PCA plot (default: None)
  - [peak_label] : Labels to assign to each peak in the loadings biplot (default: None)
  - [markerSize] : The size of each marker (default: 100)
  - [fontSize] : The font size for all text (default: 12)
  - [figSize] : The figure size as a tuple (width,height) (default: (20,10))
  - [background_colour] : Set the background colour (default: 'white')
  - [grid] : Setting to 'True' will overlay a grid (default: True)
  - [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
  - [cmap] : The CMAP colour palette to use (default: 'Set1')
pcaLoadings: Creates a lollipop plot of PCA components with bootstrapped confidence intervals.
- parameters
  - [data] : array-like, shape (n_samples, n_features)
  - [peak_label] : A list of peaks to plot
  - [imageFileName] : The image file name to save to (default: 'PCA_loadings.png')
  - [saveImage] : Setting to 'True' will save the image to file (default: True)
    - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
    - [pc_num] : The principal component to plot (default: 1)
    - [boot_num] : The number of bootstrap samples to use to calculate confidence internals (default: 500)
    - [alpha] : The alpha value for the bootstrapped confidence intervals (default: 0.05)
    - [fontSize] : The font size for all text (default: 30)
    - [markerSize] : The size of each marker (default: 100)
    - [figSize] : The figure size as a tuple (width,height) (default: (40,40))
    - [transparent] : Setting to 'True' will make the background transparent (default: False)
pcoa: Creates a Principal Coordinate Analysis (PCoA) plot.
- parameters
  - [similarities] : array-like matrix, shape (n_samples, n_features)
  - [imageFileName] : The image file name to save to (default: 'PCOA.png')
  - [saveImage] : Setting to 'True' will save the image to file (default: True)
  - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
  - [n_components] : Number of components (default: 2)
  - [max_iter] : Maximum number of iterations of the SMACOF algorithm (default: 300)
  - [eps] : Relative tolerance with respect to stress at which to declare convergence (default: 1e-3)
  - [seed] : Seed number used by the random number generator for the RandomState instance (default: 3)
  - [group_label] : Labels to assign to each group/class (default: None)
  - [peak_label] : Labels to assign to each peak (default: None)
  - [markerSize] : The size of each marker (default: 100)
  - [fontSize] : The font size for all text (default: 12)
  - [figSize] : The figure size as a tuple (width,height) (default: (20,10))
  - [background_colour] : Set the background colour (default: 'white')
  - [grid] : Setting to 'True' will overlay a grid (default: True)
  - [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
  - [cmap] : The CMAP colour palette to use (default: 'Set1')

multivis.utils

loadData: Loads and validates the Data and Peak sheet from an excel file.
- parameters
  - [filename] : The name of the excel file (.xlsx file) e.g. 'Data.xlsx'.
  - [DataSheet] : The name of the data sheet in the file e.g. 'Data'. The data sheet must contain an 'Idx', 'SampleID', and 'Class' column.
  - [PeakSheet] : The name of the peak sheet in the file e.g. 'Peak'. The peak sheet must contain an 'Idx', 'Name', and 'Label' column.
- Returns
  - DataTable: Pandas dataFrame
  - PeakTable: Pandas dataFrame
groups2blocks: Slices the data by group/class name into blocks for later identification of multi-block associations and places the data into a dictionary indexed by group/class name.
- parameters
  - [PeakTable] : Pandas dataframe containing the feature/peak data. Must contain 'Name' and 'Label'.
  - [DataTable] : Pandas dataframe matrix containing values. The data must contain a column separating out the different groups in the data (e.g. Class)
  - [group_column_name] : The group column name used in the datatable (e.g. Class)
- Returns
  - [DataBlocks] : A dictionary containing DataTables indexed by group names
  - [PeakBlocks] : A dictionary containing PeakTables indexed by group names
mergeBlocks: Merges multiply different Data Tables and Peak Tables from dictionaries into a single Peak Table and Data Table (used for multi-block/multi-omics data preparation). The 'Name' column needs to be unique across all blocks. Automatically annotates the merged Peak Table with a 'Block' column and consolidates any statistical results generated from the multivis.utils.statistics package in relation to each block.
- parameters
  - [peak_blocks] : A dictionary of Pandas Peak Table dataframes from different datasets indexed by dataset type.
  - [data_blocks] : A dictionary of Pandas Data Table dataframes from different datasets indexed by dataset type.
  - [mergeType] : The type of merging to perform. Either by 'SampleID' or 'Index'.
- Returns
  - [DataTable] : Merged Pandas dataFrame
  - [PeakTable] : Merged Pandas dataFrame (with any statistical results generated by multivis.utils.statistics consolidated into each block)
transform: Scales and transforms data in forward or reverse order based on different transform options.
- parameters
  - [data] : A 1D numpy array of values
  - [transform_type] : The transform type to apply to the data ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal")
  - [min] : The minimum value for scaling
  - [max] : The maximum value for scaling
- Returns
  - [transformed_data] : A scaled and transformed 1D numpy array
scaler: Scales a series of values in a 1D numpy array or pandas dataframe matrix based on different scaling functions
- parameters
  - [data] : A pandas dataframe matrix or 1D numpy array of numerical values
  - [type] : The scaler type to apply based on sklearn preprocessing functions (default: "standard")
  - [stdScaler_with_mean] : Using "standard" scaler, center the data to the mean before scaling (default: True)
    - [stdScaler_with_std] : Using "standard" scaler, scale the data to unit variance (default: True)
    - [robust_with_centering] : Using "robust" scaler, center the data to the median before scaling (default: True)
    - [robust_with_scaling] : Using "robust" scaler, scale the data to within the quantile range (default: True)
    - [robust_unit_variance] : Using "robust" scaler, scale the data so that normally distributed features have a variance of 1 (default: False)
    - [minimum] : Using "minmax" scaler, set the minimum value for scaling (default: 0)
    - [maximum] : Using "minmax" scaler, set the maximum value for scaling (default: 1)
    - [lower_iqr] : Using "robust" scaler, set the lower quantile range (default: 25.0)
    - [upper_iqr] : Using "robust" scaler, set the upper quantile range (default: 75.0)
- Returns
  - [scaled_data] : A scaled pandas dataframe matrix or 1D numpy array of numerical values
imputeData: Imputes data given a pandas dataframe of values
- parameters
  - [data] : A pandas dataframe of values
  - [k] : The number of nearest neighbours
- Returns
  - [data_filled] : Imputed data
statistics: Generate a table of parametric or non-parametric statistics and merges them with the Peak Table (node table).
- init_parameters
  - [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'. - [datatable] : Pandas dataframe matrix containing values for statistical analysis
- methods
  - [set_params] : Set parameters
    - [parametric] : Perform parametric statistical analysis, assuming the data is normally distributed (default: True) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data prior to statistical analysis (default: (False, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'standard')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (False, 3))
    - [group_column_name] : The group column name used in the datatable (default: None) - [control_group_name] : The control group name in the datatable, if available (default: None) - [group_alpha_CI] : The alpha value for group confidence intervals (default: 0.05) - [fold_change_alpha_CI] : The alpha value for mean/median fold change confidence intervals (default: 0.05) - [pca_alpha_CI] : The alpha value for the PCA confidence intervals (default: 0.05) - [total_missing] : Calculate the total missing values per feature (Default: False) - [group_missing] : Calculate the missing values per feature per group (if group_column_name not None) (Default: False) - [pca_loadings] : Calculate PC1 and PC2 loadings for each feature (Default: True) - [normality_test] : Determine normal distribution across whole dataset using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_normality_test] : Determine normal distribution across each group (if group_column_name not None) using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_mean_CI] : Determine the mean with bootstrapped CI across each group (if parametric = True and group_column_name not None) (default: True) - [group_median_CI] : Determine the median with bootstrapped CI across each group (if parametric = False and group_column_name not None) (default: True) - [mean_fold_change] : Calculate the mean fold change with bootstrapped confidence intervals (if parametric = True, group_column_name not None and control_group_name not None) (default: False)
    - [median_fold_change] : Calculate the median fold change with bootstrapped confidence intervals (if parametric = False, group_column_name not None and control_group_name not None) (default: False) - [levene_twoGroup] : Test null hypothesis that control group and each of the other groups come from populations with equal variances (if group_column_name not None and control_group_name not None) (default: False) - [levene_allGroup] : Test null hypothesis that all groups come from populations with equal variances (if group_column_name not None) (default: False) - [oneway_Anova_test] : Test null hypothesis that all groups have the same population mean, with included Benjamini-Hochberg FDR (if parametric = True and group_column_name not None) (default: False) - [kruskal_wallis_test] : Test null hypothesis that population median of all groups are equal, with included Benjamini-Hochberg FDR (if parametric = False and group_column_name not None) (default: False) - [ttest_oneGroup] : Calculate the T-test for the mean across all the data (one group), with included Benjamini-Hochberg FDR (if parametric = True, group_column_name is None or there is only 1 group in the data) (default: False) - [ttest_twoGroup] : Calculate the T-test for the mean of two groups, with one group being the control group, with included Benjamini-Hochberg FDR (if parametric = True, group_column_name not None and control_group_name not None) (default: False) - [mann_whitney_u_test] : Compute the Mann-Whitney rank test on two groups, with one being the control group, with included Benjamini-Hochberg FDR (if parametric = False, group_column_name not None and control_group_name not None) (default: False)
  - [help] : Print this help text
  - [calculate] : Performs the statistical calculations and outputs the Peak Table (node table) with the results appended.
corrAnalysis: Correlation analysis on a matrix of values with Pearson, Spearman or Kendall's Tau.
- parameters
  - [df_data] : A Pandas dataframe matrix of values
  - [correlationType] : The correlation type to apply. Either 'Pearson', 'Spearman' or 'KendallTau'
- Returns
  - [df_corr] : Pandas dataframe matrix of all correlation coefficients
  - [df_pval] : Pandas dataframe matrix of all correlation pvalues
cluster: Clusters data using a linkage cluster method. If the data is correlated the correlations are first preprocessed, then clustered, otherwise a distance metric is applied to non-correlated data before clustering.
- parameters
  - [matrix] : A Pandas dataframe matrix of scores
  - [transpose_non_correlated] : Setting to 'True' will transpose the matrix if it is not correlated data
  - [is_correlated] : Setting to 'True' will treat the matrix as if it contains correlation coefficients
  - [distance_metric] : Set the distance metric. Used if the matrix does not contain correlation coefficients.
  - [linkage_method] : Set the linkage method for the clustering.
- Returns
  - [matrix] : The original matrix, transposed if transpose_non_correlated is 'True' and is_correlated is 'False'.
  - [row_linkage] : linkage matrix for the rows from a linkage clustered distance/similarities matrix
  - [col_linkage] : linkage matrix for the columns from a linkage clustered distance/similarities matrix

License

Multivis is licensed under the MIT license.

Authors

Brett Chapman
https://scholar.google.com.au/citations?user=A_wYNAQAAAAJ&hl=en

Correspondence

Dr. Brett Chapman, Post-doctoral Research Fellow at the Western Crop Genetics Alliance, Murdoch University. E-mail: brett.chapman@murdoch.edu.au, brett.chapman78@gmail.com

Citation

If you would like to cite MultiVis in a scientific publication, please cite this GitHub page until a citation to a publication becomes available.

multivis
Release 0.5.12

Release 0.5.12

0.5.12

0.5.11

0.5.10

0.5.9

0.5.8

0.5.7

0.5.6

0.5.5

0.5.4

0.5.3

Documentation