SpectralEntropy

The similarity score for spectral comparison


License
Apache-2.0
Install
pip install SpectralEntropy==1.0.2

Documentation

DOI Python Package using Conda Python package

When use this package, please cite this manuscript:

Li, Y., Kind, T., Folz, J. et al. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat Methods 18, 1524ā€“1531 (2021). https://doi.org/10.1038/s41592-021-01331-z

Search spectra with entropy similarity

To search spectral files with entropy similarity, you can download pre-compiled program from https://github.com/YuanyueLi/EntropySearch/releases.

For advanced user who want to calculate spectral entropy / entropy similarity / other spectral similarity by themself, please use the Python code below.

A jupyter notebook example is provided here: https://github.com/YuanyueLi/SpectralEntropy/blob/master/example.ipynb

The detailed reference for using the 43 different algorithm to calculate spectral similarity can be found here: https://SpectralEntropy.readthedocs.io/en/master/

You might noticed a entropy similarity score higher than 1 in your self-implemented code, this is due to the mistake in merging peaks within MS2-tolerance. You can use the code implemented here to avoid this problem. We are working to provide a R-implemented code for entropy similarity, which will be released soon.

Requirement

Python 3.7, numpy>=1.17.4, scipy>=1.3.2

cython>=0.29.13 (Not required but highly recommended)

# The command below is not required but strongly recommended, as it will compile the cython code to run faster
python setup.py build_ext --inplace

Spectral entropy

To calculate spectral entropy, the spectrum need to be centroid first. When you are focusing on fragment ion's information, the precursor ion may need to be removed from the spectrum before calculating spectral entropy. If isotope peak exitsted on the MS/MS spectrum, the isotope peak should be removed fist as the isotope peak does not contain useful information for identifing molecule.

Calculate spectral entropy for centroid spectrum with python is very simple (just one line with scipy package).

import numpy as np
import scipy.stats

spectrum = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)

entropy = scipy.stats.entropy(spectrum[:, 1])
print("Spectral entropy is {}.".format(entropy))
# The output should be: Spectral entropy is 0.3737888038158417.
print('-' * 30)

For profile spectrum which haven't been centroid, you can use a clean_spectrum to centroid the spectrum, for example:

import numpy as np
import scipy.stats
import spectral_entropy

spectrum = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)

spectrum = spectral_entropy.clean_spectrum(spectrum)
entropy = scipy.stats.entropy(spectrum[:, 1])
print("Spectral entropy is {}.".format(entropy))
# The output should be: Entropy similarity:0.2605222463607788.
print('-' * 30)

We provide a function clean_spectrum to help you remove precursor ion, centroid spectrum and remove noise ions. Please note that this function will not remove the isotope peak, you need to remove the isotope peak by yourself. For example:

import numpy as np
import spectral_entropy

spectrum = np.array([[41.04, 0.3716], [69.071, 7.917962], [69.071, 100.], [86.0969, 66.83]], dtype=np.float32)
clean_spectrum = spectral_entropy.clean_spectrum(spectrum,
                                                 max_mz=85,
                                                 noise_removal=0.01,
                                                 ms2_da=0.05)
print("Clean spectrum will be:{}".format(clean_spectrum))
# The output should be: Clean spectrum will be:[[69.071  1.   ]]
print('-' * 30)

Entropy similarity

Before calculate entropy similarity, the spectrum need to be centroid first. Remove the noise ions is highly recommend. Also, base on our test on NIST20 and Massbank.us database, remove ions have m/z higher than precursor ion's m/z - 1.6 will greatly improve the spectral identification performance.

We provide calculate_entropy_similarity function to calculate two spectral entropy.

import numpy as np
import spectral_entropy

spec_query = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)
spec_reference = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)

# Calculate entropy similarity.
similarity = spectral_entropy.calculate_entropy_similarity(spec_query, spec_reference, ms2_da=0.05)
print("Entropy similarity:{}.".format(similarity))
# The output should be: Entropy similarity:0.8984397722577456.
print('-' * 30)

Spectral similarity

We also provide 43 different spectral similarity algorithm for MS/MS spectral comparison

You can find the detail reference here: https://SpectralEntropy.readthedocs.io/en/master/

Example code

Before calculating spectral similarity, it's highly recommended to remove spectral noise. For example, peaks have intensity less than 1% maximum intensity can be removed to improve identificaiton performance.

import numpy as np
import spectral_entropy

spec_query = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)
spec_reference = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)

# Calculate entropy similarity.
similarity = spectral_entropy.similarity(spec_query, spec_reference, method="entropy",
                                         ms2_da=0.05)
print("Entropy similarity:{}.".format(similarity))
# The output should be: Entropy similarity:0.8984397722577456.
print('-' * 30)

# Calculate unweighted entropy similarity.
similarity = spectral_entropy.similarity(spec_query, spec_reference, method="unweighted_entropy",
                                         ms2_da=0.05)
print("Unweighted entropy similarity:{}.".format(similarity))
# The output should be: Unweighted entropy similarity:0.9826668790176113.
print('-' * 30)

# Calculate all similarity.
all_dist = spectral_entropy.all_similarity(spec_query, spec_reference, ms2_da=0.05)
for dist_name in all_dist:
    method_name = spectral_entropy.methods_name[dist_name]
    print("Method name: {}, similarity score:{}.".format(method_name, all_dist[dist_name]))

# A list of different spectral similarity will be shown.

Supported similarity algorithm list:

"entropy": Entropy distance
"unweighted_entropy": Unweighted entropy distance
"euclidean": Euclidean distance
"manhattan": Manhattan distance
"chebyshev": Chebyshev distance
"squared_euclidean": Squared Euclidean distance
"fidelity": Fidelity distance
"matusita": Matusita distance
"squared_chord": Squared-chord distance
"bhattacharya_1": Bhattacharya 1 distance
"bhattacharya_2": Bhattacharya 2 distance
"harmonic_mean": Harmonic mean distance
"probabilistic_symmetric_chi_squared": Probabilistic symmetric Ļ‡2 distance
"ruzicka": Ruzicka distance
"roberts": Roberts distance
"intersection": Intersection distance
"motyka": Motyka distance
"canberra": Canberra distance
"baroni_urbani_buser": Baroni-Urbani-Buser distance
"penrose_size": Penrose size distance
"mean_character": Mean character distance
"lorentzian": Lorentzian distance
"penrose_shape": Penrose shape distance
"clark": Clark distance
"hellinger": Hellinger distance
"whittaker_index_of_association": Whittaker index of association distance
"symmetric_chi_squared": Symmetric Ļ‡2 distance
"pearson_correlation": Pearson/Spearman Correlation Coefficient
"improved_similarity": Improved Similarity
"absolute_value": Absolute Value Distance
"dot_product": Dot-Product (cosine)
"dot_product_reverse": Reverse dot-Product (cosine)
"spectral_contrast_angle": Spectral Contrast Angle
"wave_hedges": Wave Hedges distance
"cosine": Cosine distance
"jaccard": Jaccard distance
"dice": Dice distance
"inner_product": Inner Product distance
"divergence": Divergence distance
"avg_l": Avg (L1, Lāˆž) distance
"vicis_symmetric_chi_squared_3": Vicis-Symmetric Ļ‡2 3 distance
"ms_for_id_v1": MSforID distance version 1
"ms_for_id": MSforID distance
"weighted_dot_product": Weighted dot product distance"