mirror_seq

The bioinformatics tool for Mirror-seq.


Keywords
mirror, sequencing, next-gen, hydroxymethylation, bisulfite, bioinformatics
License
Apache-2.0
Install
pip install mirror_seq==0.2.6

Documentation

What is it

Mirror-seq is a hydroxymethylation (hmc) assay invented by Zymo Research in genomes using bisulfite sequencing. This analysis tool helps biologists to analyze sequencing data. It takes Fastq files from sequencers and generate hydroxymethylation ratio for CpGs.

Where Should I Start

There are three levels for scientists to use our tool:

  • Newbies We provide tuturial for you to get familiar with it.
  • Experieced Follow the Quick Start to try it with your own data.
  • Expert You have your homebrew bioinformatics software. No problem! Just follow the instruction below to install and run the specific parts for Mirror-seq.

Installation

You need to install the following bioinformatics software in the dependencies and put them in PATH to run the full workflow. However, if you have your own trimming and alignment software, you can skip it.

pip install mirror_seq

  • Note: pip can install all the dependencies for you.

Dependencies

Python (2.7)

Bioinformatics software

Usage

We provide three commands for more details of each command, please use --help:

Trimming

mirror-trim trims off Mirror-seq specific filled-in nucleotides and also do adapter trimming and quality trimming.

Output file

  • < PREFIX >_trimmed.fastq The trimmed fastq file.

Hydroxymethylation Calling

mirror-call calls hydroxymethylation ratios for CpGs from alignment files.

Output files

  • < PREFIX>_CpG.csv.gz Each row represents a CpG. The columns are:
    • chrom The chromosome name of this CpG.
    • pos The chromosomal position of this CpG.
    • strand Either forward strand or reverse strand.
    • meth_count Number of reads aligned at the CpG which are hydroxymethylated.
    • total_count The total number of reads aligned at the CpG.
  • < PREFIX >_CpG.bed.gz Browser tracks can be loaded in USCS Genome Browser or igv to visualize hydroxymethylation data. This is the standard BED format with 8 fields. The name and score fields need more description.
    • name is formatted as < HYDROXYMETHYLATED READ COUNT >/< TOTAL READ COUNT >(< HYDROXYMETHYLATION RATIO >). For example, 0/3(0%) means non of the three reads at the CpG position is hydroxymethylated. The hydroxymethylation ratio is 0%.
    • score hydroxymethylation percentage times 1000.

Entire Workflow

mirror-seq command takes fastq files from sequencer and output the hydroxymethylation calling files.

Output files

The combination of the two commands above.