CAMSA: a tool for Comparative Analysis and Merging of Scaffold Assemblies


Keywords
comparative, genomics, scaffolding, genome, assembly, asssembly-reconciliation, comparative-genomics, genome-assembly, genome-scaffolding, python, scaffold-assembly
License
MIT
Install
pip install CAMSA==1.1.0b10

Documentation

CAMSA

MIT licensed Build Status Build status

CAMSA - is a tool for Comparative Analysis and Merging of Scaffold Assemblies, distributed both as a standalone software package and as Python library under the MIT license.

Main CAMSA features:

  1. works with any number of scaffold assemblies in de-novo non-progressive fashion
  2. allows to simultaniously work with scaffold assemblies obtained from any in silico and in vitro techniques, supporting multiple existing formats via built-in converters
  3. creates an extensive report with several comparative quality metrics (both on assembly level and on the level of individual assembly points)
  4. constructs a merged combined scaffold assembly
  5. provides an interactive framework for a visual comparative analysis of the given assemblies

CAMSA is developed using Python programming language and is compatible with both Python2 (2.7+) and Python3 (3.5+). This means CAMSA can be run on any modern operating system properly.

Note: only Linux and MacOS were tested extensively with respect to CAMSA installation and usage. If you want to use CAMSA on windows, please try to do so on your own (we have continuous integration setup on Windows platform as well, but no as detailed as on MacOS/Linux), and if you run into any problems - contact us.

Installation:

The simplest way to install CAMSA is by using pip. Open your terminal and execute the following command:

pip install camsa

For more details and other ways to install CAMSA please refer to the installation wiki page.

Input

CAMSA expect as an input a set of different assemblies on the same set of scaffolds. Eash assembly must be represented as a set of assemblies points in TSF files, using the following format:

origin    seq1    seq1_or    seq2    seq2_or    gap_size    cw
A1        s1      +          s2      -          ?           ?
A1        s2      -          s3      +          ?           ?
...

CAMSA also provides a set of built-in scripts, that allow one to translate other scaffold assembly formats (i.e., FASTA, AGPv2.0, GRIMM, etc) into CAMSA format, and vice-versa (when possible). For more details please refer to the input wiki page.

Usage

CAMSA is very straightforward to use: installation process adds several executable scripts to your (python)path, which either execute CAMSA itself, or some of its utilities. Basic CAMSA usage is

run_camsa.py f1.camsa.points f2.camsa.points ... -o output_dir

This command would run CAMSA in both comparative and merging modes, producing extensive comparative assembly reports as well as a merged assembly. For more details on running CAMSA please refer to the usage wiki page.

Output

CAMSA comparative analysis results are stored in the automatically generated a set of text-based reports as well as a single comprehensive interactive report, powered by HTML, CSS, and JavaScript. This report format allows one to process the output on any operating system, easily share results with colleges and collaborators, as well as ensure reproducibility of each experiment. The output folder also contains all the required libraries for the interactive report to properly work, so internet connection is required.

For more details regarding both text-based and interactive reports please refer to the output wiki page.

Contributing / Bug reports

Please submit any information about identified bugs to the corresponding GitHub bug tracker system.

Thank you!