dexseq_prepare_annotation2

Subread_to_DEXSeq python package


Keywords
bioinformatics
Install
pip install dexseq_prepare_annotation2==1.0.0

Documentation

Subread_to_DEXSeq

Vivek Bhardwaj

These tool provides a way to use featurecounts output for DEXSeq

  1. dexseq_prepare_annotation2 : It's same as the "dexseq_prepare_annotation.py" that comes with DEXSeq, but with an added option to output featureCounts-readable GTF file.

  2. loadSubread : R library that provides a function "DEXSeqDataSetFromFeatureCounts", to load the output of featureCounts as a dexSeq dataset (dxd) object.

Install

Install python tool:

pip install -e git+https://github.com/jvrakor/Subread_to_DEXSeq.git#egg=Subread_to_DEXSeq

Install R library (in R):

> devtools::install_github("jvrakor/Subread_to_DEXSeq", subdir = "loadSubread")

Dependencies

dexseq_prepare_annotation2 requires HTSeq

loadSubread requires devtools, dplyr, DEXSeq, GenomicRanges, and IRanges

Usage example

1) Prepare annotation

Syntax :

dexseq_prepare_annotation2 -f <featurecounts.gtf> <input.gtf> <dexseq_counts.gff>

Example :

dexseq_prepare_annotation2 -f dm6_ens76_flat.gtf dm6_ens76.gtf dm6_ens76_flat.gff

you will get a file "dm6_ens76_flat.gff" and another "dm6_ens76_flat.gtf" (for featurecounts)

2) Count using Subread (command line)

We use the -f options to count reads overlapping features.

We can use the -O option to count the reads overlapping to multiple exons (similar to DEXSeq_count).

/path/to/subread/bin/featureCounts -f -O -s 2 -p -T 40 \
-F GTF -a dm6_ens76_flat.gtf \
-o dm6_fCount.out Cont_1.bam Cont_2.bam Test_1.bam Test_2.bam

3) load into DEXSeq**

In R prepare a sampleData data.frame, which contains sample names used for featurecounts as rownames, plus condition, and other variables you want to use for DEXSeq design matrix.

Example :

library("loadSubread")
samp <- data.frame(row.names = c("cont_1","cont_2","test_1","test_2"), 
                        condition = rep(c("control","trt"),each=2))
dxd.fc <- DEXSeqDataSetFromFeatureCounts("dm6_fCount.out",
                                         flattenedfile = "dm6_ens76_flat.gtf",sampleData = samp)

This will create a dxd object that you can use for DEXSeq analysis.

Results

On a real dataset from drosophila (mapped to dm6). I compared the output from featurecounts (two modes) and DEXSeq_Counts.

In unique mode, fragments overlapping multiple features are not counted, while in multi mode, they are counted.

Dispersion Estimates

Results

Number of differentially expressed exons with 10% FDR. The output from featurecounts is highly similar to DEXSeq_Count, when we count the multi-feature overlapping reads (-O option).