A toolbox for protein folding with Python.


Keywords
prospr, protein, structure, prediction, toolbox, python, c++, extension, pypi, package, bioinformatics, computational-science, cpp, datasets, high-performance-computing, protein-folding, protein-structure-prediction, structure-prediction
License
Other
Install
pip install prospr==1.2.1

Documentation

Prospr: The Protein Structure Prediction Toolbox

Prospr's logo

GitHub PyPI GitHub Workflow Status (branch) Documentation Status pre-commit

Creator: Okke van Eck

Prospr is a universal toolbox for protein structure prediction within the HP-model. At the core, Prospr offers an easy-to-use Protein data structure, which can be used to simulate protein folding. It also offers algorithms, datasets and visualization functions. The Protein data structure tracks many properties when folding the protein. This includes tracking the number of conformation changes, which makes it possible to determine the relative hardness of a protein for a specific algorithm. This allows for a fair comparison between different algorithms.

So far, only square lattices are supported in n-dimensions. The amino acids can only be placed in the corners of the squares and have to be one unit distance away from the previously placed amino acid.

The Python package is based on a C++ core, which gives Prospr its high performance. The C++ core is made available as a separate zip file to facilitate high-performance computing applications. See the C++ core section below for direct links to the core.

Installation and documentation

This package can simply be installed via pip by running:

pip install prospr

A quickstart and reference documentation can be found at prospr.readthedocs.io. The PDF version of the complete documentation can be found here.

Archives

All the C++ core files and datasets are also available as compressed archives. See the subsections below for direct links.

C++ core

All the core code which prospr runs on, is available as a compressed archive. The folder archives contains a .zip and a .tar.gz archive.

Datasets

The complete collection of datasets is available as a compressed archive in the archives folder. It is available as a .zip and a .tar.tz archive.

Future work

This toolbox could be used for other protein folding problems within discrete models. It would be a great extension to support different models by creating a modular amino acid.

License

The used license is the GNU LESSER GENERAL PUBLIC LICENSE. A copy can be found in the LICENSE file on GitHub.