BioSAK


Keywords
Bioinformatics
License
GPL-3.0+
Install
pip install BioSAK==1.88.0

Documentation

BioSAK (A Swiss-Army-Knife for Bioinformaticians)

pypi licence pypi version DOI

Contact

Shan Zhang1 and Weizhi Song2

1 Department of Pharmacology and Pharmacy, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong

2 Department of Ocean Science, Hong Kong University of Science and Technology, Hong Kong

Installation

  • BioSAK has been tested on Linux/Mac, but NOT yet on Windows.

  • BioSAK is implemented in python3, you can

    • install it with pip3 install BioSAK

    • upgrade it with pip3 install --upgrade BioSAK

  • If you are an UNSW Katana user, this might be helpful.

Getting help

  • You can get example commands for most of the modules by typing, for example, BioSAK iTOL -h.

  • Please refer to the documentation page for help (in preparation, very messy).

  • A license statement is here.

  • A changelog is here.

BioSAK modules

  • Type BioSAK -h to see a full list of modules

              ...::: BioSAK v1.121.0 :::...
    
    Genome databases
       get_GTDB_taxon_gnm      ->  Get id of genomes from specified GTDB taxons
       get_genome_GTDB         ->  Batch download GTDB genomes
       get_genome_NCBI         ->  Batch download GenBank genomes
       sampling_GTDB_gnms      ->  Select GTDB genomes
       subset_GTDB_meta        ->  Subset metadata of GTDB reference genomes
       metaAssembly            ->  Get metadata of NCBI assembly records
       metaBiosample           ->  Get metadata of NCBI biosample records
       statsTaxa               ->  stats GTDB taxa
       GenBank                 ->  get sequence/organism/voucher info
    
    Metagenomics
       metabat2concoct         ->  convert MetaBAT depth to CONCOCT depth
       metabat2maxbin          ->  convert MetaBAT depth to MaxBin depth
       CheckM                  ->  Parse CheckM outputs
       Plot_MAG                ->  plot MAGs, (GC vs depth)
       magabund                ->  Calculate MAG abundance
       mean_MAG_cov            ->  Get mean MAG depth (by MetaBAT depth)
       RunGraphMB              ->  Prepare input files for GraphMB
       gc                      ->  Get GC content
       get_gnm_size            ->  Get the total length of genome(s)
       get_gene_depth          ->  Get gene depth by contig depth
       MeanMappingDepth        ->  Get mean mapping depth 
       get_MAG_reads_long      ->  Extract MAG-specific long reads for reassembling
       mmseqs                  ->  Classify metagenomic contigs with mmseqs
       parse_mmseqs_tsv        ->  Parse mmseqs tsv
       fastaai                 ->  A wrapper for FastAAI
       abd                     ->  get MAG abundance across metagenomes (Wenxiu Wang et al. 2024)
       abd_mask                ->  prepare masked sequence for abd module
       
    Functional annotation
       KEGG                    ->  KEGG annotation
       koala                   ->  Separate the combined BlastKOALA or GhostKOALA output
       COG2020                 ->  COG annotation (v2020, by blastp/diamond)
       COG2024                 ->  COG annotation (v2024, by blastp/diamond)
       arCOG                   ->  COG annotation for archaea (version ar18)
       dbCAN                   ->  CAZy annotation with dbCAN
       Combine_KEGG_arCOG      ->  Combine KEGG and arCOG annotation results
       Combine_KEGG_COG        ->  Combine KEGG and COG annotation results
       enrich                  ->  Functional enrichment analysis
       gapseq                  ->  Data matrix GapSeq predicted pathways
       stats_ko                ->  get stats for a list of provided KO
       stats_arcog             ->  get stats for a list of provided arCOG 
       stats_cog2024           ->  get stats for a list of provided COG (v2024)
       combine_fun_stats       ->  combine outputs from stats_ko, stats_arcog or stats_cog2024
        
    16S rRNA related
       Usearch16S              ->  Usearch for Novogene 16S amplicon sequencing results
       blca                    ->  Classify 16S with BLCA
       top_16S_hits            ->  Classify 16S by top-blast-hits approach
       SILVA_for_BLCA          ->  Prepare BLCA-compatible SILVA SSU database
       GTDB_for_BLCA           ->  Prepare BLCA-compatible GTDB SSU database
       UNITE_for_BLCA          ->  Prepare BLCA-compatible UNITE SSU database
       BLCA_op_parser          ->  Make the BLCA outputs bit easier to read
       Tax4Fun2IndOTU          ->  Get functional profile for individual OTUs (to be added)
       get_eu_otu              ->  Get eukaryotic OTUs
       rm_low_abd_otu          ->  Remove low abd otu from table
       combine_low_abd_otu     ->  Combine low abundance OTUs
       rm_low_depth_sample     ->  Remove samples from OTU table with small number of sequences
    
    Sequence manipulator
       gbk2fna/gbk2faa/gbk2ffn ->  Format convertors
       ffn2faa/gfa2fa/get_rc   ->  Format convertors
       fq2fa                   ->  Convert fastq to fasta
       fa2id                   ->  Export sequence id
       slice_seq               ->  Get specified region of a sequence
       rename_seq              ->  Rename sequences in a file
       prefix_seq_by_file_name ->  prefix sequences by file name
       select_seq              ->  Select sequences by id
       split_fasta             ->  Split one fasta file into multiple files
       merge_seq               ->  Merge sequence files, remove duplicated ones if any
       cat_fa                  ->  Combine fasta files, prefix sequence id with file name
           
    Sam and Bam
       reads2bam               ->  Mapping and sorting
       sam2bam                 ->  Sam to BAM with samtools
       split_sam               ->  Split SAM/BAM file by reference
       bam2reads               ->  Extract reads (id) from sam file
       plot_sam_depth          ->  Plot SAM depth
    
    Dataframe and Statistics
       subset_df               ->  Subset dataframe
       merge_df                ->  Merge dataframes
       add_desc                ->  Add function description to input of the iTOL module
       transpose               ->  Transpose dataframe
       wilcox                  ->  Wilcoxon signed-rank test (non-parametric paired T-test)
       mannwhitneyu            ->  Mann-Whitney U rank test on two independent samples
    
    Others
       js_cmds                 ->  Put commands in job scripts
       js_hpc3                 ->  Put commands in job scripts (HKUST hpc3)
       hpc3                    ->  Submit jobs on HKUST hpc3
       srun                    ->  srun one-line commands on HKUST hpc3
       exe_cmds                ->  Execute commands with multiprocessing
       split_folder            ->  Split folder
       prefix_file             ->  Prefix file
       BestHit                 ->  Keep best blast hits (outfmt 6)
       VisGeneFlk              ->  Visualize gene flanking regions
       usearch_uc              ->  Parse Usearch uc file
       get_Pfam_hmms           ->  Get Pfam profiles by id
       Reads_simulator         ->  Simulate NGS reads
       SubsampleLongReads      ->  Subsample Long Reads
       rename_reads_Reago      ->  Rename paired reads for Reago
       cross_link_seqs         ->  Cross link matched regions between two sequences
       submitHPC               ->  A wrapper for submitHPC.sh
       KeepRemovingTmp         ->  Keep removing old files in a folder
       ribbon                  ->  Make a ribbon diagram
       compare_sets            ->  compare_sets
       sankey                  ->  get sankey plot
       sra                     ->  Download reads with sratoolkit
       vis_color_scheme        ->  Visualize color scheme
       trim                    ->  a wrapper for trimmomatic
       FasterqDump             ->  a wrapper for fasterq-dump
       rename_df_row           ->  rename row headers in a dataframe
       blast                   ->  Parse batch online blast output
       taxdump                 ->  Parse NCBI Taxonomy database
       get_single_page_web     ->  Get single page website