genoiser

functions to tracks for genomics data files


Keywords
bam, cram, vcf, genomics, bioinformatics, high-throughput-sequencing, nim, nim-lang
License
MIT
Install
nimble install genoiser

Documentation

mosfun: write (nim) functions to get the most from your BAMs/CRAMs

mosdepth uses chromosome-sized arrays of int32's to track sequencing depth. This is fast and flexible.

given mosdepth as a special-case for depth, mosfun is a general case for user-defined functions. mosdepth could be implemented with mosfun.

This library is in progress. The idea is that mosfun handles all accounting, a user simply defines a nim function that takes an alignment and then indicates which genomic positions to increment. For example, to calculate depth, this user function would increment from start to end:

proc depthfun*(aln:Record, posns:var seq[mrange]) =
  ## depthfun is an example of a `fun` that can be sent to `mosfun`.
  ## it increments from aln.start to aln.stop of passing reads.
  var f = aln.flag
  if f.unmapped or f.secondary or f.qcfail or f.dup: return
  posns.add((aln.start, aln.stop, 1))

The posns value is sent to the function by mosfun and the user-defined function can add to it as many elements as desired. In this case it increments from aln.start to aln.stop by 1. It can inrement by any integer value.

The user could also choose to increment any soft or hard-clip location:

proc softfun*(aln:Record, posns:var seq[mrange]) =
  ## softfun an example of a `fun` that can be sent to `mosfun`.
  ## it sets positions where there are soft-clips
  var f = aln.flag
  if f.unmapped or f.secondary or f.supplementary or f.qcfail or f.dup: return
  var cig = aln.cigar
  if cig.len == 1: return
  var pos = aln.start

  for op in cig:
    if op.op == CigarOp.soft_clip or op.op == CigarOp.hard_clip:
      # for this function, we want the exact break-points, not the span of the event,
      # so we increment the position and the one that follows it.
      posns.add((pos, pos+1, 1))
    if op.consumes.reference:
      pos += op.len

Utility

This library provides the machinery. Other command-line tools will use this for more obviously useful things.