kcounter

A simple package for counting DNA k-mers in Python. Written in Rust.


Keywords
kmer, bioinformatics
License
GPL-3.0
Install
pip install kcounter==0.1.1

Documentation

kcounter

PyPI GitHub Workflow Status

A simple package for counting DNA k-mers in Python. Written in Rust.

Instalation

There are two ways to install kcounter:

  • Using pip:
pip install kcounter
  • Using conda:
conda install -c conda-forge -c bioconda kcounter

Usage

Currently, kcounter provides a single function, count_kmers, that returns a dictionary containing the k-mers of the chosen size.

>>> import kcounter
>>> kcounter.count_kmers('AAACTTTTTT', 3)
{'AAA': 1.0, 'ACT': 1.0, 'AAC': 1.0, 'CTT': 1.0, 'TTT': 4.0}
>>> kcounter.count_kmers('AAACTTTTTT', 4)
{'AACT': 1.0, 'CTTT': 1.0, 'ACTT': 1.0, 'AAAC': 1.0, 'TTTT': 3.0}

The relative_frequencies parameter can be used to obtain relative k-mer frequencies:

>>> kcounter.count_kmers('AAACTTTTTT', 3, relative_frequencies=True)
{'AAC': 0.125, 'TTT': 0.5, 'CTT': 0.125, 'ACT': 0.125, 'AAA': 0.125}

The canonical_kmers parameters aggregates the counts of reverse-complement k-mers (eg.: AGC/GCT):

>>> kcounter.count_kmers('AAACTTTTTT', 3, canonical_kmers=True)
{'ACT': 1.0, 'AAA': 5.0, 'AAC': 1.0, 'AAG': 1.0}

Plans for future versions:

  • Performance improvements.
  • Add an parameter that makes the function return a sparse k-mer counts.
  • Implement a function that returns a numpy array.