kcounter
A simple package for counting DNA k-mers in Python. Written in Rust.
Instalation
There are two ways to install kcounter
:
- Using pip:
pip install kcounter
- Using conda:
conda install -c conda-forge -c bioconda kcounter
Usage
Currently, kcounter
provides a single function, count_kmers
, that returns a dictionary containing the k-mers of the chosen size.
>>> import kcounter
>>> kcounter.count_kmers('AAACTTTTTT', 3)
{'AAA': 1.0, 'ACT': 1.0, 'AAC': 1.0, 'CTT': 1.0, 'TTT': 4.0}
>>> kcounter.count_kmers('AAACTTTTTT', 4)
{'AACT': 1.0, 'CTTT': 1.0, 'ACTT': 1.0, 'AAAC': 1.0, 'TTTT': 3.0}
The relative_frequencies
parameter can be used to obtain relative k-mer frequencies:
>>> kcounter.count_kmers('AAACTTTTTT', 3, relative_frequencies=True)
{'AAC': 0.125, 'TTT': 0.5, 'CTT': 0.125, 'ACT': 0.125, 'AAA': 0.125}
The canonical_kmers
parameters aggregates the counts of reverse-complement k-mers (eg.: AGC/GCT):
>>> kcounter.count_kmers('AAACTTTTTT', 3, canonical_kmers=True)
{'ACT': 1.0, 'AAA': 5.0, 'AAC': 1.0, 'AAG': 1.0}
Plans for future versions:
- Performance improvements.
- Add an parameter that makes the function return a sparse k-mer counts.
- Implement a function that returns a numpy array.