ervin

ERVin is a collection of tools developed to assist in discovering ERV sequences within genomic data


License
CNRI-Python-GPL-Compatible
Install
pip install ervin==0.0.6

Documentation

ERVin

This is a tool to allow for the detection of ERVs in genome segments

This has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems

Installation

pip install ervin

Requirements

  • Python 3.6+ (Download)
  • NCBI BLAST suite must be installed locally (Download)
  • Local genome db to be queried
    • This can be located in a directory of your choosing, but must be named in a config.json file
      • There is a config.json.templ file which will be used to create a config.json file from with the contained defaults at first run if you do not provide your own

Current functionality

ERViN Currently:

  • When provided with a .fasta file of probe sequences
    • Runs local tblastn against the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted)
    • Parses and merges filtered results where appropriate
    • Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using tblastn, grouping the records in a final set of output files based on their top hit

Usage

Arguments

Argument Verbose Description Type Required Default
-f --file Source fasta file containing the sample probe records to run through tblastn Filepath True
-gdb --genome_database Name of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file str True
-o --output_dir Location to which to write the result files str False <current_working_directory>/OUTPUT
-a --alignment_len_threshold Minimum length threshold that BLAST result alignment sequence lengths should exceed int False 400
-e --e_value Maximum e-value threshold that BLAST result e-values should exceed float False 0.009

Examples

ervin -f data/fasta_file.fasta -gdb genome_db

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01