RKP

Relative K-mer Project


License
GPL-3.0
Install
pip install RKP==0.1.0

Documentation

Relative Kmer Project

Abstract

WGS analysis reveals extended natural transformation in Campylobacter impacting diagnostics and the pathogens adaptive potential. Running title: WGS analysis of Campylobacter hybrid strains

Julia C. Golz 1a, Lennard Epping 2#, Marie-Theres Knüver 1a, Maria Borowiak 1b, Felix Hartkopf 2, Carlus Deneke 1b, Burkhard Malorny 1b, Torsten Semmler 2, Kerstin Stingl 1a*

1 German Federal Institute for Risk Assessment, Department of Biological Safety, a National Reference Laboratory for Campylobacter, b Study Centre for Genome Sequencing and Analysis, Berlin, Germany 2 Robert Koch Institute, Microbial Genomics, Berlin, Germany

# sharing first author
* corosponding first author

In the past decade, Campylobacter infections are getting more common worldwide. These infections can lead to diarrhea, abdominal pain, fever, headache, nausea, and/or vomiting and pose a serious danger for public health. This sparked efforts to improve prevention, treatment and reduce transmissions. As further stated by Kaakoush et al. [1], the main risks are the consumption of animal products and water, contact with animals and international travels.

As the threat to public health differs among Campylobacter species, it is important to identify dangerous Campylobacter species and investigate their characteristics in genotype and phenotype. In this work, a kmer mapping approach is used to identify recombination events and involved genes to describe hybrid species. Therefore, hybrids of Campylobacter jejunis and Campylobacter coli are analyzed to validate this approach and to develop a workflow that can be applied to emerging hybrids in general. This would allow a fast and reliable classification of hybrids.

KM3 [2] and BEDTools [5] are utilized to extract kmers of Campylobacter genomes and to calculate shared kmers of two species and their hybrids. Subsequently, these kmers can be used in combination with Blast [3] and Bowtie 2 [4] to select genes that are shared with the hybrid genomes. These genes can be grouped into batches that were involved in a single recombination event. A visualization of the gene coverage generated using R provides further information about the selected genes.

This work will provide a new generic tool for hybrid analysis that could be expanded to other bacteria and enable researchers to classify new species and recombination events in a fast and reliable manner.

[1] Global Epidemiology of Campylobacter Infection Nadeem O. Kaakoush, Natalia Castaño-Rodríguez, Hazel M. Mitchell, Si Ming Man Clinical Microbiology Reviews Jun 2015, 28 (3) 687-720; DOI: 10.1128/CMR.00006-15
[2] Marek Kokot, Maciej Długosz, Sebastian Deorowicz, KMC 3: counting and manipulating k-mer statistics, Bioinformatics, Volume 33, Issue 17, 01 September 2017, Pages 2759–2761, https://doi.org/10.1093/bioinformatics/btx304
[3] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman, Basic local alignment search tool, Journal of Molecular Biology, Volume 215, Issue 3, 1990, Pages 403-410, ISSN 0022-2836, https://doi.org/10.1016/S0022-2836(05)80360-2.
[4] Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
[5] Aaron R. Quinlan, Ira M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, Volume 26, Issue 6, 15 March 2010, Pages 841–842, https://doi.org/10.1093/bioinformatics/btq033

Workflow

graph TD;
  Species_A-->KMC3;
  Species_B-->KMC3;
  Species_C-->KMC3;
  KMC3-->Intersection_A_and_B;
  Intersection_A_and_B-->Subract_C;
  Subract_C-->Bowtie_2;
  Species_B-->aligned_to_B;
  Bowtie_2-->aligned_to_B;
  aligned_to_B-->extract_genes;
  extract_genes-->calculate_coverage;
  calculate_coverage-->blast_sequences;
  blast_sequences-->plot_heatmap;