pyMolNetEnhancer

A python implementation of MolNetEnhancer


License
MIT
Install
pip install pyMolNetEnhancer==0.2.0

Documentation

pyMolNetEnhancer

pyMolNetEnhancer is a python module integrating chemical class and substructure information within mass spectral molecular networks created through the Global Natural Products Social Molecular Networking (GNPS) platform. An analogous R package is available at https://github.com/madeleineernst/RMolNetEnhancer.

Table of contents

Installation

Install pyMolNetEnhancer with:

pip install pyMolNetEnhancer

Map MS2LDA substructural information to mass spectral molecular networks (classical)

In order to map substructural information to a mass spectral molecular network you need to:

Then execute the code in Example_notebooks/Mass2Motifs_2_Network_Classical.ipynb line by line. The only things you need to specify are:

  1. Your GNPS job ID
  2. Your MS2LDA job ID Note: Depending on the size of this file, a server connection timeout may occur. Alternatively, you may download the file manually at http://ms2lda.org/:
  3. User-defined parameters for mapping the Mass2Motifs onto the network prob: minimal probability score for a Mass2Motif to be included. Default is 0.01.
    overlap: minimal overlap score for a Mass2Motif to be included. Default is 0.3.
    Important: The probability and overlap thresholds can be set within the ms2lda.org app as well under the Experimental Options tab. It is recommendable to do so when inspecting results in the web app. Importantly, the summary table contains filtered motif-document relations using the set thresholds in the web app.
    top: This parameter specifies how many most shared motifs per molecular family (network component index) should be shown. Default is 5.

To visualize results import the .graphml output file into Cytoscape. To color edges based on shared Mass2Motifs in between nodes select 'Stroke Color' in the 'Edge' tab to the left and choose 'interaction' as Column and 'Discrete Mapping' as Mapping Type:

To color nodes by the most shared Mass2Motifs per molecular family (network component index) select 'Image/Chart' in the 'Node' tab to the left and select Mass2Motifs shown in 'TopSharedMotifs' in the Edge Table:

Alternatively the edges and nodes output files can also be loaded separately into Cytoscape. To this end import the 'Mass2Motifs_Edges_Classical.tsv' output file as network into Cytoscape. Select column 'CLUSTERID1' as Source Node, column 'interact' as Interaction Type and 'CLUSTERID2' as Target Node:

Then import the 'Mass2Motifs_Nodes_Classical.tsv' output file as table:

Map MS2LDA substructural information to mass spectral molecular networks (feature based)

In order to map substructural information to a mass spectral molecular network created through the feature based workflow you need to:

Then execute the code in Example_notebooks/Mass2Motifs_2_Network_FeatureBased.ipynb line by line. The only things you need to specify are:

  1. Your GNPS job ID
  2. Your MS2LDA job ID Note: Depending on the size of this file, a server connection timeout may occur. Alternatively, you may download the file manually at http://ms2lda.org/:
  3. User-defined parameters for mapping the Mass2Motifs onto the network prob: minimal probability score for a Mass2Motif to be included. Default is 0.01.
    overlap: minimal overlap score for a Mass2Motif to be included. Default is 0.3.
    Important: The probability and overlap thresholds can be set within the ms2lda.org app as well under the Experimental Options tab. It is recommendable to do so when inspecting results in the web app. Importantly, the summary table contains filtered motif-document relations using the set thresholds in the web app.
    top: This parameter specifies how many most shared motifs per molecular family (network component index) should be shown. Default is 5.

To visualize results import the .graphml output file into Cytoscape. To color edges based on shared Mass2Motifs in between nodes select 'Stroke Color' in the 'Edge' tab to the left and choose 'interaction' as Column and 'Discrete Mapping' as Mapping Type:

To color nodes by the most shared Mass2Motifs per molecular family (network component index) select 'Image/Chart' in the 'Node' tab to the left and select Mass2Motifs shown in 'TopSharedMotifs' in the Edge Table:

Alternatively the edges and nodes output files can also be loaded separately into Cytoscape. To this end import the 'Mass2Motifs_Edges_Classical.tsv' output file as network into Cytoscape. Select column 'CLUSTERID1' as Source Node, column 'interact' as Interaction Type and 'CLUSTERID2' as Target Node:

Then import the 'Mass2Motifs_Nodes_Classical.tsv' output file as table:

Map chemical class information to mass spectral molecular networks

In order to map chemical class information to a mass spectral molecular network you need to:

Then execute the code in Example_notebooks/ChemicalClasses_2_Network_Classical.ipynb or Example_notebooks/ChemicalClasses_2_Network_FeatureBased.ipynb line by line. The only things you need to specify are:

  1. Your GNPS job ID
  2. Your DEREPLICATOR job ID(s)
  3. Your NAP job ID(s)

You can specify as many in silico annotation outputs as you wish. If you import results from applications different than NAP or DEREPLICATOR make sure that your input file is tab separated and includes a column named 'Scan' containing numeric identifiers matching the numeric node identifiers in the GNPS network and a column named 'SMILES' containing SMILES structures. Make sure that you include all results as dataframe list items in the 'matches' object. The object 'gnpslib' contains all GNPS library hits:

matches = [gnpslib, nap, derep]

In this notebook we use ChemAxon's molconvert to convert SMILES to InChIKeys. You can download a platform independent version of ChemAxon's Marvin here. Make sure to have molconvert installed and add the path to the environment:

path = '/Applications/MarvinSuite/bin/'
os.environ['PATH'] += ':'+path

To visualize results import the .graphml output file into Cytoscape. To color nodes based on the chemical subclass select 'Fill Color' in the 'Node' tab to the left and choose 'CF_subclass' as Column and 'Discrete Mapping' as Mapping Type:

To color nodes based on the chemical subclass select 'Fill Color' in the 'Node' tab to the left and choose 'CF_subclass_score' as Column and 'Continuous Mapping' as Mapping Type:

All columns related to chemical class information are labeled with 'CF_', and chemical class information at other hierarchical levels of the chemical taxonomy can be mapped analogously (e.g. CF_superclass, CF_superclass_score, CF_class, etc.). The .txt output file can also be imported as table into an already existing network in Cytoscape.

Map chemical class and MS2LDA substructural information to mass spectral molecular networks

In order to map chemical class and MS2LDA substructural information to a mass spectral molecular network follow steps described in Map MS2LDA substructural information to mass spectral molecular networks (classical) and Map chemical class information to mass spectral molecular networks for classical molecular networking and steps described in Map MS2LDA substructural information to mass spectral molecular networks (feature based) and Map chemical class information to mass spectral molecular networks for feature based molecular networking. To create a graphml file containing both Mass2Motif as well as chemical class information do:

graphML_classy = make_classyfire_graphml(MG,final)
nx.write_graphml(graphML_classy, "Motif_ChemicalClass_Network_Classical.graphml", infer_numeric_types = True)

where 'MG' corresponds to the network with mapped Mass2Motifs and 'final' to the dataframe output created when mapping chemical class information. An example is shown in Example_notebooks/Mass2Motifs_2_Network_Classical.ipynb and Example_notebooks/Mass2Motifs_2_Network_FeatureBased.ipynb. To visualize the network in Cytoscape proceed as described in Map MS2LDA substructural information to mass spectral molecular networks (classical) and Map chemical class information to mass spectral molecular networks for classical molecular networking and steps described in Map MS2LDA substructural information to mass spectral molecular networks (feature based) and Map chemical class information to mass spectral molecular networks for feature based molecular networking.

Dependencies

python 3.6.5, collections 0.6.1, csv 1.0, functools, joblib 0.13.0, json 2.0.9, multiprocessing, networkx 2.1, operator, os, pandas 0.22.0, rdkit, re 2.2.1, requests 2.18.4, sys, time

Main citation

https://www.biorxiv.org/content/10.1101/654459v1
https://github.com/madeleineernst/pyMolNetEnhancer

Other citations

MolNetEnhancer uses molecular networking through GNPS:
Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; et al. Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016, 34 (8), 828–837. https://www.nature.com/articles/nbt.3597

MolNetEnhancer uses untargeted substructure exploration through MS2LDA:
van der Hooft, J.J.J.; Wandy, J.; Barrett, M.P.; Burgess, K.E.V.; Rogers, S. Topic modeling for untargeted substructure exploration in metabolomics. PNAS 2016, 113 (48), 13738-13743. https://www.pnas.org/content/113/48/13738

MolNetEnhancer uses Network Annotation Propagation (NAP):
da Silva, R. R.; Wang, M.; Nothias, L.-F.; van der Hooft, J. J. J.; Caraballo-Rodríguez, A. M.; Fox, E.; Balunas, M. J.; Klassen, J. L.; Lopes, N. P.; Dorrestein, P. C. Propagating Annotations of Molecular Networks Using in Silico Fragmentation. PLoS Comput. Biol. 2018, 14 (4), e1006089. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006089

MolNetEnhancer uses DEREPLICATOR:
Mohimani, H.; Gurevich, A.; Mikheenko, A.; Garg, N.; Nothias, L.-F.; Ninomiya, A.; Takada, K.; Dorrestein, P.C.; Pevzner, P.A. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 2017, 13, 30-37. https://www.nature.com/articles/nchembio.2219

MolNetEnhancer uses automated chemical classification through ClassyFire:
Feunang, Y. D.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D.S. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 2016, 8, 61. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-016-0174-y

License

This repository is available under the following license https://github.com/madeleineernst/pyMolNetEnhancer/blob/master/LICENSE