palindrome-tree

Gradient boosted decision tree palindrome predictor, used to locate regions for further investigation thru http://palindromes.ibp.cz/


Keywords
DNA, palindrome
License
MIT
Install
pip install palindrome-tree==1.3.1

Documentation

Palindrome tree

Palindrome tree tool is used for analyzing inverted repeats in various DNA sequences using decision trees. This tool takes provided sequences and finds interesting parts in which there's high probability of palindrome occurrence using decision tree. This process filters a big portion of data. Interesting data are then analyzed using API from Palindrome Analyzer. DNA Analyser is a web-based server for nucleotide sequence analysis. It has been developed thanks to cooperation of Department of Informatics, Mendel’s University in Brno and Institute of Biophysics, Academy of Sciences of the Czech Republic.

Requirements

Palindrome tree was built with Python 3.7+.

Installation

To install palindrome tree use Pypi repository.

pip install palindrome-tree

Usage

User has to initialize palindrome tree analyzer instance which is imported from main package palindrome_tree.

from palindrome_tree import PalindromeTree

tree = PalindromeTree()

Predict regions (without API validation)

To predict regions with possible palindromes, run analyse without setting check_with_api paramether.

from palindrome_tree import PalindromeTree

sequence_file = open("/path/to/sequence/name.txt", "r")

tree = PalindromeTree()

tree.analyse(
    sequence=sequence_file.read(),
)

tree.results

The results are then stored in results variable as pd.DataFrame.

position sequence
0 8 TTTGTAGAGACAGGGTCTTGCTGTGTTTCC
1 10 TGTAGAGACAGGGTCTTGCTGTGTTTCCCA
2 49 CGAACTCCTGGCCTCTAGGCAATCCTCCCA
3 102 ATCCCACTCTTTTTTGAAAAATAAAATCTA
4 105 CCACTCTTTTTTGAAAAATAAAATCTACCA

Predict regions (with API validation)

To predict regions with possible palindromes and afterward validation, run analyse with check_with_api paramether set.

from palindrome_tree import PalindromeTree

sequence_file = open("/path/to/sequence/name.txt", "r")

tree = PalindromeTree()

tree.analyse(
    sequence=sequence_file.read(),
    validate_with_api=True,
)

tree.validated_results

The results are also stored in results variable as pd.DataFrame.

original_index after before mismatches opposite position sequence signature spacer stability_NNModel
0 0 CC TTTGT 2 CTGTGTTT 5 AGAGACAG 8-7-2 GGTCTTG {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}
1 0 TGCTG TTTGT 2 GGGTCT 5 AGAGAC 6-1-2 A {'cruciform': -2.54, 'linear': -13.84, 'delta': 11.3}
2 0 GTGTT TGTAG 2 CTTGCT 7 AGACAG 6-3-2 GGT {'cruciform': -1.94, 'linear': -17.509999999999998, 'delta': 15.569999999999999}
3 0 TTCC TAGAG 2 CTGTGT 9 ACAGGG 6-5-2 TCTTG {'cruciform': -3.7399999999999998, 'linear': -20.99, 'delta': 17.25}
4 1 CCCA TGT 2 CTGTGTTT 3 AGAGACAG 8-7-2 GGTCTTG {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}

Dependencies

  • xgboost = "^1.5.1"
  • pandas = "^1.3.5"
  • scikit-learn = "^1.0.2"
  • requests = "^2.26.0"

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details.