Package for MOGONET method @nature communications


Keywords
multi-omics, deep, learning, apptainer, deep-learning, docker, machine-learning, multiomics, multiomics-data, nextflow, nextflow-pipeline, python, r, shell, shell-script
License
MIT
Install
pip install Mogonet==2.0.0

Documentation

Multi Omics Pipeline

  1. Overview
  2. Project Structure
  3. Setup the project
  4. Run with Conda
  5. Run with plain Nextflow (In progress)
  6. References

Overview

The Multi Omics Pipeline is a Nextflow pipeline designed for evaluating various multi-omics (genomics, proteomics, metabolomics) data integration methods with the task of classification/regression, factor analysis, clustering and others.

Project Structure

Some important locations:

  • Shell scripts for setting up the project is located at bin/
  • Configurations for the pipeline is at nextflow.config , and configs/
  • Python and R source codes are located in modules/
  • Logs, cluster output, execution report in results
For more details for project organization, please see here
.
├── bin
│   ├── 01-get_nxf_conda.sh
│   ├── 02-pull_all_containers.sh
│   ├── helper.sh
│   ├── install.sh
│   └── pbs
│       └── test_job.sh
├── configs
│   ├── base.config
│   ├── local.config
│   └── pbs_remote.config
├── containers
│   ├── dockerfiles
│   │   ├── codia.Dockerfile
│   │   ├── cooperative_learning.Dockerfile
│   │   ├── mixdiablo.Dockerfile
│   │   ├── mogonet.Dockerfile
│   │   ├── R_template.Dockerfile
│   │   ├── rbase.Dockerfile
│   │   └── smgr.Dockerfile
│   ├── names.md
│   ├── README.md
│   └── scripts
│       ├── pull_all.sh
│       └── pull_container.sh
├── data
│   ├── moni_data_reference_data.xlsx
│   ├── multiomics_data.xlsx
│   ├── README.md
│   ├── test1
│   │   └── rnorm_data_*.csv
│   └── test2
│       └── ...
├── docs
│   ├── personal
│   │   ├── links_for_nextflow.md
│   │   └── notes.md
│   ├── README.md
│   ├── sockeye_paths.md
│   └── todo_logging.md
├── LICENSE
├── main.nf
├── Makefile
├── modules
│   ├── Python
│   │   ├── Python_Mogonet.nf
│   │   └── Python_run_mogonet.py
│   ├── R
│   │   ├── cooperative_learning
│   │   │   ├── R_Cooperative_Learning.nf
│   │   │   └── R_run_cooperative_learning.R
│   │   ├── diablo
│   │   │   ├── diablo_helpers.R
│   │   │   ├── R_Diablo.nf
│   │   │   └── R_run_diablo.R
│   │   ├── helpers.R
│   │   ├── test.R
│   │   ├── unused.R
│   │   └── write_data.R
│   └── README.md
├── nextflow.config
├── pbs_job_nxf.sh
├── README.md
├── results
│   ├── nxf_logs
│   │   ├── 2023-06
│   │   │   └── nxf-run_2023-06-30_17-52-54.log
│   │   └── 2023-07
│   │       └── ...
│   ├── pbs_output
│   │   ├── 2023-06
│   │   │   └── job-output_2023-06-30_17-52-34.txt
│   │   └── 2023-07
│   │       └── ...
│   ├── README.md
│   └── reports
│       ├── 2023-06
│       │   └── execution_report_2023-06-30_17-53-06.html
│       └── 2023-07
│           └── ...
├── rstudio.pbs
├── subworkflow
│   ├── helpers.nf
│   ├── Python.nf
│   ├── R.nf
│   └── README.md
└── test_job.sh

Setup

In order to execute the pipeline, you need to have satisfy the following requirements:

Requirements:

  • Nextflow 22.10.7 or above
  • Bash 4.2.46
  • Java 11 (or later, up to 18), recommend using openJDK 11.0.18
  • Docker/Apptainer 1.1.4 (formerly Singularity 3.8.5)
  • Conda (optional)

Then, execute the following commands and follow an alternative you like by running it with conda (RECOMMENDED) or without conda as plain nextflow:

Note: This is on ARC sockeye only for now

  1. First, choose a place you want to clone this project, preferrably in your scratch space:

    # Replace st-singha53-1 with you own allocation code if any
    # $USER is defined on your sockeye, it is your cwl
    cd /scratch/st-singha53-1/$USER 
  2. Then, load required modules and clone this github repository to the path cd before:

    # Assuming you're in your scratch space 
    # Load require modules
    module load gcc/9.4.0 git/2.31.1
    # Clone repo
    # After successful clone, you would see 'multi-omics-pipeline' is created in your current pwd
    git clone https://github.com/tonyliang19/multi-omics-pipeline.git 
  3. Proceed to one of the option below for more instructions to run the project:

Download from conda (Sockeye only)

View details here

Recommended: run make all in your terminal on Sockeye

NOTE: The setup process could take 10-15 for the first time, you could come back later to it.

Recommended:

# cd to the cloned repo
cd multi-omics-pipeline
# setup the environment and submit a sample batch job
make all # This relates to the Makefile, if you wish to know more about it

Alternative (if make is not available):

  1. Run this script install.sh located in ~/bin dir:

    cd multi-omics-pipeline
    # This going to take some time
    bash bin/install.sh 
  2. After finished installation, you could submit the job by:

    # Assuming you in ~/../multi-omics-pipeline
    # Named output by formatted time
    OUTPUT_NAME="results/pbs_output/job-output_$(eval date +%Y-%m-%d_%H-%M-%S).txt)"
    # Submit job
    qsub -o ${OUTPUT_NAME} pbs_job_nxf.sh

Download from Nextflow

View detatils here

Install Nextflow by using the following command:

curl -s https://get.nextflow.io | bash

Launch the pipeline execution with the following commands:

  1. Local testing environment

    make run_local

  2. Remote cluster environment

    make_run_remote

License

This project is licensed under the MIT License

Reference

Ding DY, Li S, Narasimhan B, Tibshirani R (2022) Cooperative learning for multiview analysis. Proc Natl Acad Sci USA 119:e2202113119.

Rohart, F., Gautier, B., Singh, A. & Le Cao, K. A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).

Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics.2019;35:3055–62.

Wang, T., Shao, W., Huang, Z. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun 12, 3445 (2021).