scikit-fingerprints is a Python library for efficient computation of molecular fingerprints.
- Description
- Supported platforms
- Installation
- Basic Usage
- General Project Vision
- Contributing
- License
Molecular fingerprints are crucial in various scientific fields, including drug discovery, materials science, and chemical analysis. However, existing Python libraries for computing molecular fingerprints often lack performance, user-friendliness, and support for modern programming standards. This project aims to address these shortcomings by creating an efficient and accessible Python library for molecular fingerprint computation.
You can find the documentation HERE
- The library offers various functions that accept molecule descriptors (e.g., SMILES) and fingerprint parameters, returning the specified fingerprints.
- It's open-source and available for installation via pip.
- The library has been designed for ease of use, minimizing the need for extensive training.
- Compatibility with the standard Python ML stack, based on scikit-learn interfaces, has been a top priority.
python3.9 |
python3.10 |
python3.11 |
python3.12 |
|
---|---|---|---|---|
Ubuntu - latest | ✅ | ✅ | ✅ | ✅ |
Windows - latest | ✅ | ✅ | ✅ | ✅ |
macOS - latest | only macOS 13 | ✅ | ✅ | ✅ |
You can install the library using pip:
pip install scikit-fingerprints
from skfp.fingerprints import AtomPairFingerprint
smiles_list = ['O=S(=O)(O)CCS(=O)(=O)O', 'O=C(O)c1ccccc1O']
atom_pair_fingerprint = AtomPairFingerprint()
X_skfp = atom_pair_fingerprint.transform(smiles_list)
print(X_skfp)
The primary goal of this project was to develop a Python library that simplifies the computation of widely-used molecular fingerprints, such as Morgan's fingerprint, MACCS fingerprint, and others. This library has the following key features:
-
User-Friendly Interface: The library was designed to provide an intuitive interface, making it easy to integrate into machine learning workflows.
-
Performance Optimization: We implemented molecular fingerprint computation algorithms using concurrent programming techniques to maximize performance. Large datasets of molecules are processed in parallel for improved efficiency.
-
Compatibility: The library's interface was inspired by popular data science libraries like Scikit-Learn, ensuring compatibility and familiarity for users familiar with these tools.
-
Extensibility: Users should be able to customize and extend the library to suit their specific needs.
Please read CONTRIBUTING.md and CODE_OF_CONDUCT.md for details on our code of conduct, and the process for submitting pull requests to us.
This project is licensed under the MIT License - see the LICENSE.md file for details.