ppnet

A package for identifying functional association networks by phylogenetic profiling of prokaryotic genomes.


Keywords
Functional, association, network, inference, Phylogenetic, profiling, Prokaryotic, genome, de-replication
License
GPL-3.0
Install
pip install ppnet==1.0.2

Documentation

PPNet

Introduction

  • What is PPNet? PPNet is designed to uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial association networks of a single species.

Installation

PPNet has the following dependencies:

  • prokka
  • roary
  • Python(>=version 3.7)
  • Python modules:
    • biopython
    • pyvis
    • numpy
    • scipy
    • statsmodels
    • kneed
    • pyani
  • Install with the source codes
    • Download the source codes:
      git clone https://github.com/liyangjie/PPNet.git
    • Rename the main program and add the path to the environment variable:
      # Rename PPNet.py to PPNet
      mv PPNet/bin/ppnet.py PPNet/bin/ppnet
      # Give the scripts executable permission
      chmod +x PPNet/bin/*
      # Add the path to the environment variable
      echo export PATH="/Path/to/PPNet/bin:$PATH" >> ~/.bashrc
      source ~/.bashrc
    • Install the Python dependencies:
      pip install biopython pyvis numpy scipy statsmodels pyani
    • Install the external dependances either from source or from your packaging system:
      prokka roary

Usage

ppnet [Options]
Options:
      [-h] show this help message and exit
      [-i1] [Required] The path of input genomes
      [-i2] [Required] The path of phenotype (e.g., pathogenic or non-pathogenic) of all strains
      [-o] The path of output (Default "./PPNet_output")
      [-x] The suffix of genomes data (Default "fasta")
      [-c] number of CPUs to use
      [-a] [Required] Select the algorithm for calculating the correlation coefficient [1-81], or set 0 to use all algorithm.
      [-pt] What percentage of interactions will be visualized (Default "1")

Algorithm

See Algorithm.docx

Examples

ppnet -i1 PATH/to/your/genomes/ -i2 group.csv -x fasta -c 4 -a 1

Input

The genome file should be in fasta format and placed in the same path. The group.csv

Output

  • PPNet_output/HQ_data/*: High quality genomes which with N50 > 10000;
  • PPNet_output/NR_data/*: Non-redundant genome sets after deduplication;
  • PPNet_output/Prokka_result/*: The result files of Prokka
  • PPNet_output/Gff_file/*: Include the GFF file extracted from the prokka_result folder with the input file for roary
  • PPNet_output/Roary_result/*: Result files generated by roary
  • PPNet_output/Roary_result/Statistical_test_result.csv: The result of Fisher's exact test for the distribution of each gene, by default, PPNet reports all genes with a adjusted p-value <0.05.
  • PPNet_output/Roary_result/filted_phylogenetic_profile.csv: The phylogenetic profile of orthologs with significantly different distributions.
  • PPNet_output/Roary_result/netwrok_result_method_x.csv: List the association coefficient calculated by algorithm x between each pair of genes.
  • PPNet_output/Gene_net_x.html: A network plot inferred by algorithm x that can be opened with a browser(Google Chrome,Microsoft Edge etc.).By default, only first percent of interactions were visualized.

License

BtToxin_Digger is free software under a GPLv3 license.