Multi Omics Pipeline

Overview
Project Structure
Setup the project
Run with Conda
Run with plain Nextflow (In progress)
References

Overview

The Multi Omics Pipeline is a Nextflow pipeline designed for evaluating various multi-omics (genomics, proteomics, metabolomics) data integration methods with the task of classification/regression, factor analysis, clustering and others.

Project Structure

Some important locations:

Shell scripts for setting up the project is located at bin/
Configurations for the pipeline is at nextflow.config , and configs/
Python and R source codes are located in modules/
Logs, cluster output, execution report in results

For more details for project organization, please see here

.
├── bin
│   ├── 01-get_nxf_conda.sh
│   ├── 02-pull_all_containers.sh
│   ├── helper.sh
│   ├── install.sh
│   └── pbs
│       └── test_job.sh
├── configs
│   ├── base.config
│   ├── local.config
│   └── pbs_remote.config
├── containers
│   ├── dockerfiles
│   │   ├── codia.Dockerfile
│   │   ├── cooperative_learning.Dockerfile
│   │   ├── mixdiablo.Dockerfile
│   │   ├── mogonet.Dockerfile
│   │   ├── R_template.Dockerfile
│   │   ├── rbase.Dockerfile
│   │   └── smgr.Dockerfile
│   ├── names.md
│   ├── README.md
│   └── scripts
│       ├── pull_all.sh
│       └── pull_container.sh
├── data
│   ├── moni_data_reference_data.xlsx
│   ├── multiomics_data.xlsx
│   ├── README.md
│   ├── test1
│   │   └── rnorm_data_*.csv
│   └── test2
│       └── ...
├── docs
│   ├── personal
│   │   ├── links_for_nextflow.md
│   │   └── notes.md
│   ├── README.md
│   ├── sockeye_paths.md
│   └── todo_logging.md
├── LICENSE
├── main.nf
├── Makefile
├── modules
│   ├── Python
│   │   ├── Python_Mogonet.nf
│   │   └── Python_run_mogonet.py
│   ├── R
│   │   ├── cooperative_learning
│   │   │   ├── R_Cooperative_Learning.nf
│   │   │   └── R_run_cooperative_learning.R
│   │   ├── diablo
│   │   │   ├── diablo_helpers.R
│   │   │   ├── R_Diablo.nf
│   │   │   └── R_run_diablo.R
│   │   ├── helpers.R
│   │   ├── test.R
│   │   ├── unused.R
│   │   └── write_data.R
│   └── README.md
├── nextflow.config
├── pbs_job_nxf.sh
├── README.md
├── results
│   ├── nxf_logs
│   │   ├── 2023-06
│   │   │   └── nxf-run_2023-06-30_17-52-54.log
│   │   └── 2023-07
│   │       └── ...
│   ├── pbs_output
│   │   ├── 2023-06
│   │   │   └── job-output_2023-06-30_17-52-34.txt
│   │   └── 2023-07
│   │       └── ...
│   ├── README.md
│   └── reports
│       ├── 2023-06
│       │   └── execution_report_2023-06-30_17-53-06.html
│       └── 2023-07
│           └── ...
├── rstudio.pbs
├── subworkflow
│   ├── helpers.nf
│   ├── Python.nf
│   ├── R.nf
│   └── README.md
└── test_job.sh

Setup

In order to execute the pipeline, you need to have satisfy the following requirements:

Requirements:

Nextflow 22.10.7 or above
Bash 4.2.46
Java 11 (or later, up to 18), recommend using openJDK 11.0.18
Docker/Apptainer 1.1.4 (formerly Singularity 3.8.5)
Conda (optional)

Then, execute the following commands and follow an alternative you like by running it with conda (RECOMMENDED) or without conda as plain nextflow:

Note: This is on ARC sockeye only for now

First, choose a place you want to clone this project, preferrably in your scratch space:

# Replace st-singha53-1 with you own allocation code if any
# $USER is defined on your sockeye, it is your cwl
cd /scratch/st-singha53-1/$USER

Then, load required modules and clone this github repository to the path cd before:

# Assuming you're in your scratch space 
# Load require modules
module load gcc/9.4.0 git/2.31.1
# Clone repo
# After successful clone, you would see 'multi-omics-pipeline' is created in your current pwd
git clone https://github.com/tonyliang19/multi-omics-pipeline.git

Proceed to one of the option below for more instructions to run the project:
- conda (RECOMMENDED)
- Plain Nextflow

Download from conda (Sockeye only)

View details here

Recommended: run make all in your terminal on Sockeye

NOTE: The setup process could take 10-15 for the first time, you could come back later to it.

Recommended:

# cd to the cloned repo
cd multi-omics-pipeline
# setup the environment and submit a sample batch job
make all # This relates to the Makefile, if you wish to know more about it

Alternative (if make is not available):

Run this script install.sh located in ~/bin dir:

cd multi-omics-pipeline
# This going to take some time
bash bin/install.sh

After finished installation, you could submit the job by:

# Assuming you in ~/../multi-omics-pipeline
# Named output by formatted time
OUTPUT_NAME="results/pbs_output/job-output_$(eval date +%Y-%m-%d_%H-%M-%S).txt)"
# Submit job
qsub -o ${OUTPUT_NAME} pbs_job_nxf.sh

Download from Nextflow

View detatils here

Install Nextflow by using the following command:

curl -s https://get.nextflow.io | bash

Launch the pipeline execution with the following commands:

Local testing environment

make run_local
Remote cluster environment

make_run_remote

License

This project is licensed under the MIT License

Reference

Ding DY, Li S, Narasimhan B, Tibshirani R (2022) Cooperative learning for multiview analysis. Proc Natl Acad Sci USA 119:e2202113119.

Rohart, F., Gautier, B., Singh, A. & Le Cao, K. A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).

Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics.2019;35:3055–62.

Wang, T., Shao, W., Huang, Z. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun 12, 3445 (2021).

Mogonet
Release 2.0.0

Release 2.0.0

2.0.0

1.0.7

1.0.6

1.0.5

1.0.4

1.0.3

1.0.2

1.0.1

1.0.0

0.0.2

Documentation

Multi Omics Pipeline

Overview

Project Structure

Setup

Download from conda (Sockeye only)

Download from Nextflow

License

Reference

Stats

Development practices

Releases

Contributors

Mogonet Release 2.0.0

Release 2.0.0 Toggle Dropdown 2.0.0 1.0.7 1.0.6 1.0.5 1.0.4 1.0.3 1.0.2 1.0.1 1.0.0 0.0.2

Documentation

Multi Omics Pipeline

Overview

Project Structure

Setup

Download from conda (Sockeye only)

Download from Nextflow

License

Reference

Stats

Development practices

Releases

Contributors

Mogonet
Release 2.0.0

Release 2.0.0

2.0.0

1.0.7

1.0.6

1.0.5

1.0.4

1.0.3

1.0.2

1.0.1

1.0.0

0.0.2