Multi Omics Pipeline
- Overview
- Project Structure
- Setup the project
- Run with Conda
- Run with plain Nextflow (In progress)
- References
Overview
The Multi Omics Pipeline is a Nextflow pipeline designed for evaluating various multi-omics (genomics, proteomics, metabolomics) data integration methods with the task of classification/regression, factor analysis, clustering and others.
Project Structure
Some important locations:
- Shell scripts for setting up the project is located at
bin/
- Configurations for the pipeline is at
nextflow.config
, andconfigs/
- Python and R source codes are located in
modules/
- Logs, cluster output, execution report in
results
For more details for project organization, please see here
.
├── bin
│ ├── 01-get_nxf_conda.sh
│ ├── 02-pull_all_containers.sh
│ ├── helper.sh
│ ├── install.sh
│ └── pbs
│ └── test_job.sh
├── configs
│ ├── base.config
│ ├── local.config
│ └── pbs_remote.config
├── containers
│ ├── dockerfiles
│ │ ├── codia.Dockerfile
│ │ ├── cooperative_learning.Dockerfile
│ │ ├── mixdiablo.Dockerfile
│ │ ├── mogonet.Dockerfile
│ │ ├── R_template.Dockerfile
│ │ ├── rbase.Dockerfile
│ │ └── smgr.Dockerfile
│ ├── names.md
│ ├── README.md
│ └── scripts
│ ├── pull_all.sh
│ └── pull_container.sh
├── data
│ ├── moni_data_reference_data.xlsx
│ ├── multiomics_data.xlsx
│ ├── README.md
│ ├── test1
│ │ └── rnorm_data_*.csv
│ └── test2
│ └── ...
├── docs
│ ├── personal
│ │ ├── links_for_nextflow.md
│ │ └── notes.md
│ ├── README.md
│ ├── sockeye_paths.md
│ └── todo_logging.md
├── LICENSE
├── main.nf
├── Makefile
├── modules
│ ├── Python
│ │ ├── Python_Mogonet.nf
│ │ └── Python_run_mogonet.py
│ ├── R
│ │ ├── cooperative_learning
│ │ │ ├── R_Cooperative_Learning.nf
│ │ │ └── R_run_cooperative_learning.R
│ │ ├── diablo
│ │ │ ├── diablo_helpers.R
│ │ │ ├── R_Diablo.nf
│ │ │ └── R_run_diablo.R
│ │ ├── helpers.R
│ │ ├── test.R
│ │ ├── unused.R
│ │ └── write_data.R
│ └── README.md
├── nextflow.config
├── pbs_job_nxf.sh
├── README.md
├── results
│ ├── nxf_logs
│ │ ├── 2023-06
│ │ │ └── nxf-run_2023-06-30_17-52-54.log
│ │ └── 2023-07
│ │ └── ...
│ ├── pbs_output
│ │ ├── 2023-06
│ │ │ └── job-output_2023-06-30_17-52-34.txt
│ │ └── 2023-07
│ │ └── ...
│ ├── README.md
│ └── reports
│ ├── 2023-06
│ │ └── execution_report_2023-06-30_17-53-06.html
│ └── 2023-07
│ └── ...
├── rstudio.pbs
├── subworkflow
│ ├── helpers.nf
│ ├── Python.nf
│ ├── R.nf
│ └── README.md
└── test_job.sh
Setup
In order to execute the pipeline, you need to have satisfy the following requirements:
Requirements:
- Nextflow 22.10.7 or above
- Bash
4.2.46
- Java 11 (or later, up to 18), recommend using openJDK
11.0.18
- Docker/Apptainer
1.1.4
(formerly Singularity3.8.5
) - Conda (optional)
Then, execute the following commands and follow an alternative you like by running it with conda (RECOMMENDED) or without conda as plain nextflow:
Note: This is on
ARC sockeye
only for now
-
First, choose a place you want to clone this project, preferrably in your scratch space:
# Replace st-singha53-1 with you own allocation code if any # $USER is defined on your sockeye, it is your cwl cd /scratch/st-singha53-1/$USER
-
Then, load required modules and clone this github repository to the path
cd
before:# Assuming you're in your scratch space # Load require modules module load gcc/9.4.0 git/2.31.1 # Clone repo # After successful clone, you would see 'multi-omics-pipeline' is created in your current pwd git clone https://github.com/tonyliang19/multi-omics-pipeline.git
-
Proceed to one of the option below for more instructions to run the project:
- conda (RECOMMENDED)
- Plain Nextflow
Download from conda (Sockeye only)
View details here
Recommended: run make all
in your terminal on Sockeye
NOTE: The setup process could take 10-15 for the first time, you could come back later to it.
Recommended:
# cd to the cloned repo
cd multi-omics-pipeline
# setup the environment and submit a sample batch job
make all # This relates to the Makefile, if you wish to know more about it
Alternative (if make
is not available):
-
Run this script
install.sh
located in~/bin
dir:cd multi-omics-pipeline # This going to take some time bash bin/install.sh
-
After finished installation, you could submit the job by:
# Assuming you in ~/../multi-omics-pipeline # Named output by formatted time OUTPUT_NAME="results/pbs_output/job-output_$(eval date +%Y-%m-%d_%H-%M-%S).txt)" # Submit job qsub -o ${OUTPUT_NAME} pbs_job_nxf.sh
Download from Nextflow
View detatils here
Install Nextflow by using the following command:
curl -s https://get.nextflow.io | bash
Launch the pipeline execution with the following commands:
-
Local testing environment
make run_local
-
Remote cluster environment
make_run_remote
License
This project is licensed under the MIT License
Reference
Ding DY, Li S, Narasimhan B, Tibshirani R (2022) Cooperative learning for multiview analysis. Proc Natl Acad Sci USA 119:e2202113119.
Rohart, F., Gautier, B., Singh, A. & Le Cao, K. A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).
Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics.2019;35:3055–62.
Wang, T., Shao, W., Huang, Z. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun 12, 3445 (2021).