A collection of usefull operations on VCF files containing structural variants calls.
# Stable version pip install -U pySVtools # Bleeding edge version: pip install -U git+https://github.com/wyleung/pySVtools.git#egg=pysvtools
Installation of the dependencies is done if the installation is done using
pip. However, when used directly from the source, you should install the following external libraries:
( or use
pip install -r requirements.txt)
Intersecting IN/DEL events
INS events, you can intersect 2 or more
VCF-files using the following command:
mergevcf -f 100 -i sample1.vcf sample2.vcf \ -o intersected.tsv -b intersected.bed -v intersected.vcf
tsv file is a matrix listing the:
- Intersected hits, with both breakpoints (5' and 3'), coverage (DP) and size.
locationin each sample, and size of the event
Intersecting TRA/ CTX events
In order to merge translocations, one should set the flanking margin to a higher number.
Recommended setting is to try out with
-t -f 2000 first, this will give some confident calls.
One can allow more flanking by increasing the
-f value. F.e.g.:
-t -f 5000 to allow 5kb difference in the centerpoint.
usage: merge.py [-h] [-c EXCLUSION_REGIONS] [-f FLANKING] [-t] [-i VCF [VCF ...]] [-o OUTPUT] [-b BEDOUTPUT] [-v VCFOUTPUT] [-r REGIONS_OUT] optional arguments: -h, --help show this help message and exit -c EXCLUSION_REGIONS, --exclusion_regions EXCLUSION_REGIONS Exclusion regions file in BED format -f FLANKING, --flanking FLANKING Centerpoint flanking  -t, --translocation_only Do translocations only -i VCF [VCF ...], --vcf VCF [VCF ...] The VCF(s) to compare, can be supplied multiple times -o OUTPUT, --output OUTPUT Output summary to [sample.tsv] -b BEDOUTPUT, --bedoutput BEDOUTPUT Output bed file to [sample.bed] -v VCFOUTPUT, --vcfoutput VCFOUTPUT Output summary to [sample.vcf] -r REGIONS_OUT, --regions_out REGIONS_OUT Output all regions to [regions_out.bed]
- Intersecting SV events, using multiple VCF. Usefull for finding recuring event accros multiple 'samples'
- Filtering events using known regions using
bedfiles describing f.e.g. GC-rich regions and/or known CNV regions
- Summarizing SV events in a nice LaTeX table
Please file an issue report on Github if there are any questions using this library/tool