raatk

reduce amino acid toolkit


License
BSD-2-Clause
Install
pip install raatk==1.2.7

Documentation

RAATK

RAATK: A Python-based reduce amino acid toolkit of machine learning for protein sequence level inference

Installation

$ pip install raatk

or

$ pip install git+https://github.com/huang-sh/raatk.git@master -U

All commands within paper can be tested by running demo.sh in demo directory after installing RAATK

$ ./demo.sh

Function


Command

view

$raatk view -t 9 -s 2 4 6 10 12 14 16 --visual

Output:

type9  2  IMVLFWY-GPCASTNHQEDRK                   BLOSUM50 matrix
type9  4  IMVLFWY-G-PCAST-NHQEDRK                 BLOSUM50 matrix
type9  6  IMVL-FWY-G-P-CAST-NHQEDRK               BLOSUM50 matrix
type9  10 IMV-L-FWY-G-P-C-A-STNH-QERK-D           BLOSUM50 matrix
type9  12 IMV-L-FWY-G-P-C-A-ST-N-HQRK-E-D         BLOSUM50 matrix
type9  14 IMV-L-F-WY-G-P-C-A-S-T-N-HQRK-E-D       BLOSUM50 matrix
type9  16 IMV-L-F-W-Y-G-P-C-A-S-T-N-H-QRK-E-D     BLOSUM50 matrix

view

reduce

reduce sequence according to built-in reduction alphabets. And the output is stored in directories.

$raatk reduce positive.txt negative.txt -t 1-8 -s 2-19 -o pos neg

reduce sequence according to specific amino acid cluster. The output result is in a single file.

$raatk reduce positive.txt -c IMV-L-FWY-G-P-C-A-STNH-QERK-D -o reduce_positive.txt

extract

extract sequence features of directories, and the output is also stored in directories.

$raatk extract pos neg -k 3 -d -o k3 -m

extract sequence features of files, and the output is also stored in files.

$raatk extract pos/type9/4-IGPN.txt neg/type9/4-IGPN.txt -k 1 -o t9s4-k1.csv -m -raa IGPN

Output:

label,I,G,P,N
0.000000,0.125000,0.062500,0.562500,0.250000
0.000000,0.291667,0.166667,0.416667,0.125000
0.000000,0.277778,0.083333,0.416667,0.222222
                  ......
1.000000,0.177778,0.133333,0.377778,0.311111
1.000000,0.166667,0.000000,0.583333,0.250000
1.000000,0.387097,0.161290,0.322581,0.129032

And a feature file without label and the feature use

$raatk extract pos/type9/4-IGPN.txt -k 1 -o t9s4-k1p.csv -raa IGPN --count --label-f

Output:

I,G,P,N
2.000000,1.000000,9.000000,4.000000
7.000000,4.000000,10.000000,3.000000
10.000000,3.000000,15.000000,8.000000
                  ......

eval

evaluate the performance of different alphabet clusters based on machine learning. And the output is a json file.

$raatk eval k3 -d -o k3-eval -clf svm -c 2 -g 0.5 -p 3

evaluate a single file.

$raatk eval k3/type2/10-ARNCQHIFPW.csv -cv -1 -c 2 -g 0.5 -o k3-t2s10.txt

output:

                        0                         
0   38  7
1   7  36

      tp   fn   fp   tn   recall  precision  f1-score  
  0   38    7    7   36    0.84     0.84       0.84    
  1   36    7    7   38    0.84     0.84       0.84    
acc                                            0.84
mcc                                            0.68
-------------------------------------------------------

plot

result of json visualization

$raatk plot k3-eval.json -o k3p

output: plot

roc

ROC evaluation

$raatk roc k3/type2/10-ARNCQHIFPW.csv -clf svm -cv 5 -c 2 -g 0.5 -o roc

output: roc

ifs

incremental feature selection

$raatk ifs k3/type2/10-ARNCQHIFPW.csv -s 2 -clf svm -cv 5 -c 2 -g 0.5 -o ifs

output: roc