SimLoRD is a read simulator for long reads from third generation sequencing and is currently focused on the Pacific Biosciences SMRT error model.
Homepage Repository PyPI Python
pip install simlord==1.0.4
SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.
Reads are simulated from both strands of a provided or randomly generated reference sequence.
We recommend using miniconda and creating an environment for SimLoRD
# Create and activate a new environment called simlord conda create -n simlord python=3 pip numpy scipy cython source activate simlord # Install packages that are not available with conda from pip pip install pysam pip install dinopy pip install simlord # You now have a 'simlord' script; try it: simlord --help # In case of a new version update as follows: pip install simlord --upgrade # To switch back to your normal environment, use source deactivate
SimLoRD is a pure Python program. This means that it runs on any operating system (OS) for which Python 3 and the other packages are available.
Example 1: Simulate 10000 reads for the reference ref.fasta, use the
default options for simulation and store the reads in myreads.fastq
and the alignment in myreads.sam
.
simlord --read-reference ref.fasta -n 10000 myreads
Example 2: Generate a reference with 10 mio bases GC content 0.6
(i.e., probability 0.3 for both C and G; thus 0.2 probability for both A
and T), store the reference as random.fasta, and simulate 10000 reads
with default options, store reads as myreads.fastq
, do not store
alignments.
simlord --generate-reference 0.6 10000000 --save-reference random.fasta\ -n 10000 --nosam myreads
Example 3: Simulate reads from the given reference.fasta
, using
a fixed read length of 5000 and custom subread error probabilities (12%
insertion, 12% deletion, 2% substitution). As before, save reads as
myreads.fastq
and myreads.sam
.
simlord --read-reference reference.fasta -n 10000 -fl 5000\ -pi 0.12 -pd 0.12 -ps 0.02 myreads
A full list of parameters, as well as their documentation, can be found here.
Version 1.0.2 (2017-03-17)
New Features
Warning: Using --without-ns may lead to biased read coverage depending on the size of contigs without Ns and the expected readlength.
Bugs fixed
Version 1.0.1 (2017-01-03)
Bugs fixed
Version 1.0.0 (2016-07-13)
API Changes
Example:
reference ATCG read CAAT true alignment ||X| ATTG Before: SEQ CAAT and CIGAR string 2=1X1= Now: SEQ ATTG and CIGAR string 2=1X1=
SimLoRD is Open Source and licensed under the MIT License.