exopropred

A tool to predict the subcellular localisation of exosomal proteins


License
GPL-3.0
Install
pip install exopropred==1.1

Documentation

ExoProPred

A computational approach to predict the subcellular localisation of exosomal proteins using the sequence information of the proteins.

Introduction

ExoProPred is a webserver to predict exosomal proteins based on hybrid model that combines machine learning model with motif-search approach. The models are trained on a dataset comprising of 2831 exosomal proteins and 2831 non-exosomal proteins. The performance of the models were evaluated using 5-fold cross-validation. The models were trained on top 70 best features comprising of composition-based and evolutionary information based features as well as on hybrid features(Top 70 features + Motif-search) by implementing random-forest classifier from the scikit library of python. In the standalone version, random-forerst classifier based model is implemented alongwith the motif-search usinf MERCI tool, named it as hybrid approach. ExoProPred is also available as web-server at https://webs.iiitd.edu.in/raghava/exopropred. Please read/cite the content about the ExoProPred for complete information including algorithm behind the approach.

Reference

Arora A, Patiyal S, Sharma N, Devi NL, Kaur D, Raghava GPS. A random forest model for predicting exosomal proteins using evolutionary information and motifs. Proteomics. 2023 Jul 31:e2300231. doi: 10.1002/pmic.202300231. Epub ahead of print. PMID: 37525341.

Standalone

The Standalone version of transfacpred is written in python3 and following libraries are necessary for the successful run:

  • scikit-learn
  • Pandas
  • Numpy

Minimum USAGE

To know about the available option for the stanadlone, type the following command:

python3 exopropred.py -h

To run the example, type the following command:

python3 exopropred.py -i example_input.fa

This will predict if the submitted sequences are exososomal proteins or non-exososomal proteins. It will use other parameters by default. It will save the output in "outfile.csv" in CSV (comma seperated variables).

Full Usage

usage: exopropred.py [-h] -i INPUT [-o OUTPUT] [-m {1,2}] [-t THRESHOLD]
                     [-d {1,2}]
Please provide following arguments

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input: protein or peptide sequence(s) in FASTA format
                        or single sequence per line in single letter code
  -o OUTPUT, --output OUTPUT
                        Output: File for saving results by default outfile.csv
  -m {1,2}, --model {1,2}
                        Model Type: 1: Composition based model, 2: Hybrid
                        Model, by default 1
  -t THRESHOLD, --threshold THRESHOLD
                        Threshold: Value between 0 to 1 by default 0.51
  -d {1,2}, --display {1,2}
                        Display: 1:Exosomal Proteins only, 2: All Proteins, by
                        default 1

Input File: It allow users to provide input in the FASTA format.

Output File: Program will save the results in the CSV format, in case user do not provide output file name, it will be stored in "outfile.csv".

Threshold: User should provide threshold between 0 and 1, by default its 0.51.

Model: User is allowed to choose between two different models, such as, 1 for composition-based model, 2 for hybrid model, by default its 1.

Display type: This option allow users to fetch either only exososomal proteins by choosing option 1 or prediction against all proteins by choosing option 2.

ExoProPred Package Files

It contantain following files, brief descript of these files given below

INSTALLATION : Installations instructions

LICENSE : License information

README.md : This file provide information about this package

model.zip : This zipped file contains the compressed version of model

envfile : This file compeises of paths for the PSI-BLAST, MERCI_motif_locator.pl, Motifs, and Swiss-Prot database.

exopropred.py : Main python program

MERCI_motif_locator.pl : Perl script for locating motifs using MERCI

swissprot : Swiss-Prot database for calculating PSSM profile

motifs : Folder containing the motif files

src : Folder containing the python scripts for PSSM based composition features

Data : Folder containing the files to calculate the features using Pfeature

example_input.fa : Example file contain peptide sequenaces in FASTA format

example_composition_model_output.csv : Example output file for composition-based model

example_hybrid_model_output.csv : Example output file for hybrid model