toil-battenberg

A toil wrapper for cgpBattenberg.


License
BSD-3-Clause
Install
pip install toil-battenberg==1.0.3

Documentation

toil_battenberg

pypi badge travis badge codecov badge docker badge docker badge code formatting

A toil implementation of cgpBattenberg 1.4.0 for WGS data, to learn more, see this presentation and this paper. toil_battenberg is passing all tests with +90% coverage. Several unit tests are included for all modules including jobs .

Contents

Usage

toil_battenberg CLI is divided in 3 steps, subclones, refitcn and finalise.

  • 🔥 subclones runs all Battenberg steps besides finalise.

  • 🔧 refitcn is used when battenberg's ploidy and purity solutions are not correct. This step can be run as many times as desired. Intermediate subclones results for every refitcn run will be stored in separate directories named outdir/subclones_chr{chromosome}_pos{start-position}_maj{major-allele}_min{minor-allele}; where chromosome, start-position, major-allele and minor-allele correspond to the parameters used to refit the copy number result.

  • 💾 finalise takes --subclones-dir parameter with the selected subclones solution and completes the pipeline. Intermediate results are kept in a compressed file intermediates.tar.gz. These include:

    • *BAF.tab
    • *LogR.tab
    • *BAFsegmented.txt
    • *logRsegmented.txt

Notice its required that you use a different jobstore for each sub-command, please see:

toil_battenberg --help

Run up to Subclones

toil_battenberg subclones runs the following processes:

  • allelecount
  • baflog
  • imputefromaf
  • impute
  • combineimpute
  • haplotypebafs
  • cleanuppostbaf
  • plothaplotypes
  • combinebafs
  • segmentphased
  • fitcn
  • subclones

See this example:

toil_battenberg subclones \
    {outdir}/jobstore_subclones \
    --stats \
    --writeLogs {outdir}/toil_logs \
    --logFile {outdir}/toil_logs.txt \
    --batchSystem LSF \
    --outdir {outdir} \
    --tumor-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-T1-1-D1-2/E-H-116873-T1-1-D1-2.bam \
    --normal-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-N1-1-D1-2/E-H-116873-N1-1-D1-2.bam \
    --reference /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/genome/gr37.fasta \
    --ignore-contigs-file /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/ignored-contigs.txt \
    --prob-loci /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/probloci.txt.gz \
    --thousand-genomes-loc /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/1000genomesloci \
    --impute-info /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute/impute_info.txt \
    --impute-dir /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute \
    --is-male

Refit Copy Number

toil_battenberg refitcn re-calculates rho and psi and re-fits them to get a new copy number profile.

Every time this command is run, a new subclones directory will be created in {outdir} called subclones_chr{value}_pos{value}_maj{value}_min{value}. Make sure you pass the selected directory to finalise. Select a reliable aberrant copy number segment for which the CN state is 'known'. Then run using the segment position, chromosome, major_allele copy number estimate and the minor_allele copy number estimate.

Following the previous example:

toil_battenberg refitcn \
    --disableCaching \
    --writeLogs {outdir}/toil_logs \
    --realTimeLogging \
    --logFile {outdir}/toil_logs_subclones_chr1_pos765595_maj2_min3.txt \
    --stats \
    {outdir}/jobstore_subclones_chr1_pos765595_maj2_min3 \
    --outdir {outdir} \
    --tumor-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-T1-1-D1-2/E-H-116873-T1-1-D1-2.bam \
    --normal-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-N1-1-D1-2/E-H-116873-N1-1-D1-2.bam \
    --reference /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/genome/gr37.fasta \
    --ignore-contigs-file /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/ignored-contigs.txt \
    --prob-loci /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/probloci.txt.gz \
    --thousand-genomes-loc /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/1000genomesloci \
    --impute-info /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute/impute_info.txt \
    --impute-dir /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute \
    --is-male \
    --chromosome 1 \
    --start-position 765595 \
    --major-allele 2 \
    --minor-allele 3 \
    --path-battenberg /ifs/work/leukgen/opt/cgp/5.18.4/cgpBattenberg/1.4.0

Finalise

toil_battenberg finalise completes the pipeline. This step requires 3 additional parameters, --platform, --assembly, --species, and the path to the battenberg installation --path-battenberg. Do not run the finalise step before checking the results: the final subclones images must be inspected manually. Some examples are available in the presentation.

To conclude the example you would need to run:

toil_battenberg finalise \
    --disableCaching \
    --writeLogs {outdir}/toil_logs \
    --realTimeLogging \
    --logFile {outdir}/toil_logs_finalise.txt \
    --stats \
    {outdir}/jobstore_finalise \
    --outdir {outdir} \
    --tumor-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-T1-1-D1-2/E-H-116873-T1-1-D1-2.bam \
    --normal-bam /ifs/res/leukgen/local/opt/leukdc/data/tests/107/E-H-116873-N1-1-D1-2/E-H-116873-N1-1-D1-2.bam \
    --reference /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/genome/gr37.fasta \
    --ignore-contigs-file /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/ignored-contigs.txt \
    --prob-loci /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/probloci.txt.gz \
    --thousand-genomes-loc /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/1000genomesloci \
    --impute-info /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute/impute_info.txt \
    --impute-dir /ifs/work/leukgen/ref/homo_sapiens/GRCh37d5/battenberg/impute \
    --is-male \
    --assembly GRCH37D5 \
    --species HUMAN \
    --path-battenberg /ifs/work/leukgen/opt/cgp/5.18.4/cgpBattenberg/1.4.0

Contributing

Contributions are welcome, and they are greatly appreciated, check our contributing guidelines!