NCBI-Companion

NCBI_Companion assists you to build a reference database with a fasta and a mapping file through Genbank


License
MIT
Install
pip install NCBI-Companion==2.1.1

Documentation

NCBI_Companion

version: 1.0.8

installation:

pip3 install NCBI-Companion --user

load module:

from NCBI_Companion import NCBI_Companion

Introduction

NCBI_Companion targets to construct databases, generate a fasta file and a taxonomy mappingusing by using Genbank Entrez utilities.

Current Supported Database: nuccore, nucleotide, genome

Please also follow NCBI E-utitlites rules rules

Biopython copyright belongs to Biopython teams E-Utitlites copyright belongs to NCBI teams


Contents

class LoadSpecies

This class is to used fetch species name contains in csv file, text file, excel or fasta file.

Functions

function names parameters Return
LoadSpecies species_input, species_output, ifout initialize
ReadSpeciesFile_text None a list (if specified, a table in local directory)
ReadSpeciesFile_excel sp_col, sheetname, header, fullname a list (or a local file)
ReadSpeciesFile_csv sp_col, header, fullname a list (or a local file )
ExtractSpeciesFromFasta ranges, delimiter a list (or a local file)

all those functions return a list of species names or ID list

Usage

species = LoadSpecies(input_file, output_file, ifout = True)
species.ReadSpeciesFile_excel(sp_col='act_sym_fullname', sheetname = 0, header = 0,  fullname = False)

class NCBI_Tools

This class implements Biopython Entrez to interact with NCBI/Genbank API to either get accession id, taxonomy id, sequences or taxonomy ranking, or converting them. Some of core functions return a tracker flag that can trace which functions have been called. This order will be used by Sqlite_Dumps class to generate fasta file and mapping file

Functions

Functions parameters return database table name Tracker Flag Used For
NCBI_Tools ncbi_key, ncbi_email, sqlite_db, ncbi_db, idtype initialize None None Initialize
Update_API key, email updated key and email None None update your key and email
getTracker None a tracker list None None get all the tracker ID
ncbi_Species2Genome species list accession ID list Sp2Genome P9 convert species to accession IDs
ncbi_Search2Acc search terms, howmany to quire Tracker Search2AccIDs P1 search term to get accession number
ncbi_Species2Acc species list, more terms (optional) Tracker Sp2AccIDs P2 convert species and with extra terms to get accession id
ncbi_GetSeqsFromAcc table_name, column_name Tracker Acc2Seq P3 from accession ID to get sequences for each
ncbi_GetTaxIdFromAcc table_name, column_name Tracker Uid2TaxIDs P4 from accession ID to get tax ID for each
ncbi_eAcc2Seq accession ID Tracker Acc2Seq P5 search accession list to get sequences
ncbi_eAcc2TaxID accession ID Tracker Uid2TaxIDs P6 search accesson list to get tax id
ncbi_Species2Taxa species list, style, levels_n Tracker Sp2Taxa P7 search species to get taxonomy ranking
ncbi_Id2Taxa style, levels_n Tracker TaxId2Ranking P8 from accession and tax id to get taxonomy ranking

Usage

Example 1: Known species list and search for its trnL gene

Tracker Flag: P2---> P3 / P4 ---> P4 / P3 ---> P8

from NCBI_Companion import NCBI_Companion
companion = NCBI_Companion.NCBI_Tools(key, email, 'databasename', 'nuccore', 'acc')
companion.ncbi_Species2Acc(species_list, 'trnL')
companion.ncbi_GetSeqsFromAcc('Sp2AccIDs', 'acc_id')
companion.ncbi_GetTaxIdFromAcc('Sp2AccIDs', 'acc_id')
companion.ncbi_Id2Taxa(style = 'qiime', levels_n = 7) # 7 levels of taxonomy ranking qiime style D_0_..;

class Sqlite_Dumps

Functions

functions parameters return
Sqlite_Dumps sqlite_db, output_prefix, tracker, header_type = 'acc' initialize
sqlite_dump None a fasta and a mapping file in local

Usage

# get tracker
tracker = companion.getTracker()
dump = Sqlite_Dumps('database constructed by NCBI_Tools', 'fasta and mapping file name, not extension', tracker, header_type = 'acc')