pysuper

Search for oligopeptide fragments


Keywords
c, efficient, fast, pattern matching, protein, superposition
Licenses
GPL-3.0/GPL-3.0+
Install
pip install pysuper==0.2.91

Documentation

Donate to me on Liberapay.

Master: build status coverage report

Latest 0.2 release: build status coverage report

Super -- Information for users and developers

Super is able to rapidly search of 3D structural databases. Given a query fragment, Super searches through a database of 3D structures for the query within a tolerence threshold (measured by RMSD).

Releases

Dependencies

There are a few dependencies required to build and run Super:

  1. python2 (https://www.python.org/) script is used to pre-process PDB text datafiles into an efficient binary database format
  2. prody (http://prody.csb.pitt.edu/) is used to parse PDB files
  3. check (https://libcheck.github.io/check/) is used for unit testing and can be disabled by passing --disable-check to the configure script.
  4. lcov is used for code coverage. It is not necessary by default, but passing --enable-code-coverage to the configure script will search for the lcov program.

Compile

./configure --prefix=${HOME}/some/install/path

I say ${HOME} so that I don't have to install as superuser. I often use --prefix=${HOME}/install

make && make install

Running Super

in ${prefix}/bin directory: $ LD_LIBRARY_PATH=../lib ./super [OPTIONS...] ${pdb_path}/pdb.db

Options listing from ./super --help:
Usage: super [OPTION...] DATABASE
super -- A 3D protein pattern search program.

  -c, --thread-count=COUNT   Concurrently process the database with COUNT
                             threads of control
  -d, --defaults             Keep default arguments.
  -g, --disable-gershgorin   Disable use of gershgorin circles for Jacobi
                             diagonalisation
  -l, --lowerbounds=BOUNDS   Comma separated list of lower bound checks to use
  -n, --disable-rmsd         Disable the full (Kearsley) RMSD calculation, just
                             use the LB
  -o, --output=OUTPUT        Output to OUTPUT instead of stdout
      --quiet                Produce no output
  -q, --query=QUERY          Query database
  -r, --disable-mmap         Disable use of memory mapped databases to speed up
                             calculations
  -t, --threshold=THRESHOLD  Pattern matching threshold measured in Angstrom
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

I usually run with:

LD_LIBRARY_PATH=${HOME}/install/lib ./super --lowerbounds=arithmetic --threshold=1.0 --query=qry.db pdb.db

OR ON Mac OSX:

DYLD_LIBRARY_PATH=${HOME}/install/lib ./super -t 1.0 -q qry.db pdb.db

Databases

An up-to-date pre-processed version of the entire PDB is available for download from http://lcb.infotech.monash.edu.au/super/pdb.db

To generate a query:

python pdb_pp.py --query [YOUR PDB FRAGMENT FILE].pdb --output myquery.qry

To generate a searchable database from a directory containing PDB files (pdb/):

python pdb_pp.py -o pdb.db pdb/