toil_battenberg
A toil implementation of cgpBattenberg 1.4.0 for WGS data, to learn more, see this presentation and this paper. toil_battenberg
is passing all tests with +90% coverage. Several unit tests are included for all modules including jobs
Contents
Usage
toil_battenberg
CLI is divided in 3 steps, subclones
, refitcn
and finalise
.
-
🔥 subclones
runs all Battenberg steps besidesfinalise
. -
🔧 refitcn
is used when battenberg's ploidy and purity solutions are not correct. This step can be run as many times as desired. Intermediate subclones results for every refitcn run will be stored in separate directories namedoutdir/subclones_chr{chromosome}_pos{start-position}_maj{major-allele}_min{minor-allele}
; wherechromosome
,start-position
,major-allele
andminor-allele
correspond to the parameters used to refit the copy number result. -
💾 finalise
takes--subclones-dir
parameter with the selectedsubclones
solution and completes the pipeline. Intermediate results are kept in a compressed fileintermediates.tar.gz
. These include:- *BAF.tab
- *LogR.tab
- *BAFsegmented.txt
- *logRsegmented.txt
Notice its required that you use a different jobstore for each sub-command, please see:
toil_battenberg --help
Run up to Subclones
toil_battenberg subclones
runs the following processes:
allelecount
baflog
imputefromaf
impute
combineimpute
haplotypebafs
cleanuppostbaf
plothaplotypes
combinebafs
segmentphased
fitcn
subclones
See this example:
toil_battenberg subclones \
{outdir}/jobstore_subclones \
--stats \
--writeLogs {outdir}/toil_logs \
--logFile {outdir}/toil_logs.txt \
--batchSystem LSF \
--outdir {outdir} \
--tumor-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-T1-1-D1-2/E-H-116873-T1-1-D1-2.bam \
--normal-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-N1-1-D1-2/E-H-116873-N1-1-D1-2.bam \
--reference /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/genome/gr37.fasta \
--ignore-contigs-file /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/ignored-contigs.txt \
--prob-loci /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/probloci.txt.gz \
--thousand-genomes-loc /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/1000genomesloci \
--impute-info /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute/impute_info.txt \
--impute-dir /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute \
--is-male
Refit Copy Number
toil_battenberg refitcn
re-calculates rho and psi and re-fits them to get a new copy number profile.
Every time this command is run, a new subclones directory will be created in
{outdir} called subclones_chr{value}_pos{value}_maj{value}_min{value}
.
Make sure you pass the selected directory to finalise
. Select a reliable aberrant copy number segment for which the CN state is 'known'. Then run using the segment position
, chromosome
, major_allele
copy number estimate and the minor_allele
copy number estimate.
Following the previous example:
toil_battenberg refitcn \
--disableCaching \
--writeLogs {outdir}/toil_logs \
--realTimeLogging \
--logFile {outdir}/toil_logs_subclones_chr1_pos765595_maj2_min3.txt \
--stats \
{outdir}/jobstore_subclones_chr1_pos765595_maj2_min3 \
--outdir {outdir} \
--tumor-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-T1-1-D1-2/E-H-116873-T1-1-D1-2.bam \
--normal-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-N1-1-D1-2/E-H-116873-N1-1-D1-2.bam \
--reference /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/genome/gr37.fasta \
--ignore-contigs-file /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/ignored-contigs.txt \
--prob-loci /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/probloci.txt.gz \
--thousand-genomes-loc /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/1000genomesloci \
--impute-info /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute/impute_info.txt \
--impute-dir /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute \
--is-male \
--chromosome 1 \
--start-position 765595 \
--major-allele 2 \
--minor-allele 3 \
--path-battenberg /ifs/work/leukgen/opt/cgp/5.18.4/cgpBattenberg/1.4.0
Finalise
toil_battenberg finalise
completes the pipeline. This step requires 3 additional parameters, --platform
, --assembly
, --species
, and the path to the battenberg installation --path-battenberg
. Do not run the finalise
step before checking the results: the final subclones images must be inspected manually. Some examples are available in the presentation.
To conclude the example you would need to run:
toil_battenberg finalise \
--disableCaching \
--writeLogs {outdir}/toil_logs \
--realTimeLogging \
--logFile {outdir}/toil_logs_finalise.txt \
--stats \
{outdir}/jobstore_finalise \
--outdir {outdir} \
--tumor-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-T1-1-D1-2/E-H-116873-T1-1-D1-2.bam \
--normal-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-N1-1-D1-2/E-H-116873-N1-1-D1-2.bam \
--reference /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/genome/gr37.fasta \
--ignore-contigs-file /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/ignored-contigs.txt \
--prob-loci /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/probloci.txt.gz \
--thousand-genomes-loc /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/1000genomesloci \
--impute-info /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute/impute_info.txt \
--impute-dir /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute \
--is-male \
--assembly GRCH37D5 \
--species HUMAN \
--path-battenberg /ifs/work/leukgen/opt/cgp/5.18.4/cgpBattenberg/1.4.0
Contributing
Contributions are welcome, and they are greatly appreciated, check our contributing guidelines!