About Xenomnake
1.1. PDX Model
1.2. Spatial Integration
1.3. Pipeline Overview
1.4. Publication
Installation
2.1. Create conda environment and build dependencies
Running Xenomake
3.1. QuickStart
        3.1.2 Memory Requirements and Runtimes
3.2. Step1: Xenomake Init
        3.2.1. Description
        3.2.2. Flags
        3.2.3. Command Line Implementation
        3.2.4. Output
3.3. Step2: Xenomake Species
        3.3.1 Description
        3.3.2 Flags
        3.3.3. Command Line Implementation
        3.3.4. Output
3.4 Step3: Xenomake Config
        3.4.1 Decsription
        3.4.2 Flags
        3.4.3 Command Line Implementation
3.5. Step3: Run Snakemake Pipeline
        3.5.1. Snakemake Description
        3.5.2. Dry Run
        3.5.3. Execute Snakemake Workflow
        3.5.4. Snakemake Standard Outs
        3.5.5. Xenomake Output Structure
Downstream Cell-Cell Interactions
QC
5.1. Scanpy QC Plots
5.2. Xengsort Metrics
5.3. STAR Metrics
5.4. UMI Metrics
Test Dataset
6.1. Download Data
6.2. Run Xenomake Pipeline
Optional Downloads
7.1. Dropseq
7.2. Picard

About Xenomake

Processing pipeline for Patient Derived Xenograft (PDX) Spatial Transcriptomics Datasets

PDX Model

Patient-derived xenografts are a widely used component of cancer research
These models consist of grafting a human tumor sample into mice to study interactions between a patient's tumor and stroma, screening compounds that target a patient's tumor, and simulating human tumor biology

Spatial Integration

With the rising popularity of spatially resolved transcriptomic technologies, there is a growing need for processing pipelines that can handle reads from PDX samples. Spatial processing has the potential to analyze tumor/stroma interactions and tumor microenvironments.

Pipeline

To facilitate the adoption of spatial transcriptomics for PDX studies, we thus have developed a pipeline "Xenomake", which is an end-to-end pipeline that includes read alignment and gene quantification steps for reads generated by spatial transcriptomic platforms and uses a xenograft sorting tool (Xengsort) to apportion xenograft reads to the host and graft genomes.
Xenomake's structure is based on Snakemake and includes scripts to handle multi-mapping, downstream analysis, and handling "ambiguous" reads

Publication

Xenomake: a pipeline for processing and sorting xenograft reads from spatial transcriptomic experiments
Benjamin S Strope, Katherine E Pendleton, William Z Bowie, Gloria V Echeverria, Qian Zhu
bioRxiv 2023.09.04.556109; doi: https://doi.org/10.1101/2023.09.04.556109

Installation

Create the conda environment and build dependencies

# Create conda environment with the provided environment file
conda env create -n xenomake -f environment.yaml

# Activate conda environment
conda activate xenomake

#install xenomake
pip install xenomake

Running Xenomake

Quickstart

Quickstart: Initialize Xenomake:

#Initialize Xenomake
xenomake init \
	-r1 <read1> \
	-r2 <read2> \
	-s <sample name> \
	-o <output directory> \
	--spatial_mode <preset spatial mode [visium,dbit-seq,seq-scope] or custom> \
	--run_mode <preset run mode [lenient,prude] or custom> \

Quickstart: Add Species Information:

# Print Help Function Call
xenomake species --help

# Create species folder 
xenomake species \
	-mr <mouse_reference_assembly.fa> \
	-hr <human_reference_assemble.fa> \
	-ma <mouse_annotation.gtf> \
	-ha <human_annotation.gtf> \

Quickstart: Spatial Configuration:

If default run mode and spatial mode, this step can be skipped
If 'custom' run mode or spatial mode, you must input the following parameters

# Print Help Function Call
xenomake spatial --help

# Create configuration file
xenomake spatial \
	--barcode_file <barcode whitelist> \
	--umi <umi structure> \
	--cell_barcode <cell barcode structure> \
	--beads <number of spots/beads> \
	--spot_diameter_um <spot diameter> \
	--slide_size_um <slide size> \
	--ambiguous <re-partition ambiguous reads from xenograft sorting> \
	--downstream <perform downstream data processing> \
	--genic_only <only count genic reads (i.e., no UTR, intergenic,intronic> \
	--mm_reads <if read is mult-imapped, choose most likely position>

Quickstart: Run Snakemake Pipeline

# Dry Run
xenomake run -n 
# Initialize Xenomake Pipeline (species dir and config.yaml need to be created)
xenomake run \
	--cores <n_threads>

Memory Requirements and Runtimes

Process	Threads	Time	RAM Recommended
Xengsort Index	8	33 min	25 Gb
STAR Index	8	~90 min per genome	128 Gb
Xenomake Pipeline (Test Dataset: 8.4m reads)	8	37 min	35 Gb
Xenomake Pipeline (Test Dataset: 8.4m reads)	4	42 min	35 Gb
Xenomake Pipeline (Medulloblastoma: 88m reads)	8	300 min	35 Gb

Step 1: Project Initialization:

This is a REQUIRED step to initialize your implementation of the xenomake pipeline

Description

xenomake init takes information about your reads, run modes, spatial chemistry, and sample/project names to create a configuration file for your project

Run Modes
===========

Run modes refer to how reads are handled during mapping and xenograft sorting portions of the pipeline. Options are the following:

1. lenient: xenograft ambiguous re-mapped (True), multi-mapped re-aligned (True), only genic reads (True) <br>
2. stringent: xenograft ambiguous re-mapped (False), multi-mapped re-aligned (False), only genic reads (True) <br>
3. custom: User Defined

Spatial Modes
===============

Spatial modes refer to the spatial technology used to generate reads

1. visium <br>
2. slide-seq <br>
6. dbit-seq <br>
7. custom: User Defined barcode structure, umi structure, spot size, slide size, barcode file, number of beads/spots

Required Flags:

--read1: paired-end read in fastq format
--read2: second paired-end read
--outdir: name of project directory where the output file will be written
--sample: name of sample to prepend filenames and create sample directory within the project directory
--spatial_mode: preset run modes based upon spatial method used [custom,visium,slide_seq,hdst_seq,stereo_seq,pixel_seq,dbit_seq]
--run_mode: preset run modes based upon read processing. This controls multi-mapped, ambiguous, and non-genic read handling [custom, prude, lenient]

Optional Flags:

--root_dir: directory of project execution (default: ./)
--temp_dir: temporary directory (default: /tmp)
--picard: path to user preferred picard.jar tool (default: tools/picard.jar)
--dropseq_tools: path to user preferred download of dropseq tools directory (default: tools/Dropseq_2.5.1)
--download_species: download mm10 and hg38 genomes and annotations to the current directory (default: False)
--download_index: download star indices and xengosort index to the current directory (default: False)

Command Line Implementation:

# Print Help Function Call
xenomake init --help

#Initialize Project
xenomake init \
	-r1 <sample_R1.fastq.gz> \
	-r2 <sample_R2.fastq.gz> \
	-o <project_name> \
	-s <sample_name> \
	--spatial_mode <spatial_name> \
	--run_mode <custom, lenient, spatial>

Output Message

sample name: A1
spatial chemistry: visium
run mode: lenient
project "downsampled" initialized, proceed to species setup and project execution

Example Configuration File

Step 2: Xenomake Species

This is a REQUIRED step to initialize your implementation of the xenomake pipeline

Description

xenomake species links all inputs in a new directory structure that is then easily referenced in project run

Required Flags:

--mouse_ref: Mouse Reference Assembly in fasta format
--human_ref: Human Reference Assembly in fasta format
--mouse_annotation: Mouse Genome Annotation File in .gtf format
--human_annotation: Human Genome Annotation File in .gtf format

Optional Flags:

Including optional flags for indices will skip parts in the pipeline such as STAR --runMode CreateIndex and xengsort index
These two rules can be time-consuming and if these files are already generated, it doesn't need to be done again.

--human_index: STAR index directory for human genome reference
--mouse_index: STAR index directory for mouse genome reference
--xengsort_hash: Xengsort hash table filepath to be used in xengsort classify step
--xengsort_info: Xengosrt info table filepath to be used in xengsort classify step

Command Line Implementation:

# Print Help Function Call
xenomake species --help

# Create species folder 
xenomake species \
	--mouse_reference <mouse_reference_assembly.fa> \
	--human_reference <human_reference_assemble.fa> \
	--mouse_annotation <mouse_annotation.gtf> \
	--human_annotaion <human_annotation.gtf> \

Output Message

species directory successfully completed, keep project execution in the current directory

Output Directory Structure

├─ species
│ ├── idx.hash
│ ├── idx.info
│ ├── human
│ │ ├── genome.fa
│ │ ├── annotation.gtf
│ │ ├── star_index/
│ ├── mouse
│ │ ├── genome.fa
│ │ ├── annotation.gtf
│ │ ├── star_index/

Step 3: Xenomake Config (custom run)

Description

Only necessary if you selected custom for either run_mode or spatial_mode as a part of xenomake init

Spatial Flags

--barcode_file:
--umi:
--cell_bc:
--beads:
spot_diameter_um:
slide_size_um:

Run Flags

downstream:
mm_reads:
genic_only:
--ambiguous:

Command Line Implementation:

if spatial_mode: custom

xenomake config \
	--barcode_file <path to spot/cell barcode file> \
	--umi <umi structure in units of bases> \
	--cell_bc <barcode structure in unit of bases> \
	--beads <number of spots/beads in spatial array> \
	--spot_diameter_um <diameter of spot in um> \
	--slide_size_um <diameter of slide in um>

if run_mode : custom

xenomake config \
	--downstream <downstream processing (True/False)> \
	--mm_reads <count multi-mapped reads (True/False)> \
	--genic_only <only count genic reads in final count matrix (True/False)> \
	--ambiguous <partition xenograft 'ambiguous' reads back into alignments (True/False)>

Step 4: Execute Project

Snakemake

Xenomake uses the Snakemake workflow management system to create reproducible and scalable data analyses
To get an understanding of snakemake, please visit: https://snakemake.github.io/
For additional information, read Snakemake's paper https://doi.org/10.12688/f1000research.29032.1

Dry Run

It is recommended to first execute a dry-run with flag --dry-run, Xenomake will only show the execution plan instead of performing steps.

Flags:

-cores: number of cores to use for project run
--dry-run: Dry-Run showing the workflow
--rerun-incomplete: Force to rerun incompletely generated files if project halted
--keep-going: If a job fails, keep executing independent jobs that don't rely on its output

#Dry Run
xenomake run --cores <n_cores> --dry-run
#Run Project
xenomake run --cores <n_cores> --keep-going

Example Snakemake Standard Outs

Below shows an example of the messages printed by Snakemake. Briefly, they give an overview of the jobs in line for execution and print messages when a rule is being performed. This includes a few important features:

Start time
Finish time
Rule name
Input files
Output file destinations
Wildcard values
Additional params and log destinations

Snakemake will also print any standard-out messages from tools performed. We have saved some of these as log files for subsequent review

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	DGE_Human
	1	Expression_QC
...
	1	Subset_Overlapped
	1	Xengsort_Clasify
	1	all
	28
Select jobs to execute...

[Fri Sep  8 10:36:04 2023]
rule Preprocess:
    input: fastq/sub1.fq, fastq/sub2.fq
    output: out/treated/preprocess/unaligned_bc_umi_tagged.bam, out/treated/logs/Preprocess.summary
    log: out/treated/logs/Preprocess.log
    jobid: 6
    wildcards: OUTDIR=out, sample=treated

[Fri Sep  8 10:36:06 2023]
Finished job 6.
1 of 28 steps (4%) done
Select jobs to execute...

...
...
...

[Fri Sep 8 10:47:41 2023]
localrule all:
    input: out/treated/final/treated_human_counts.tsv.gz, out/treated/final/treated_mouse_counts.tsv.gz, out/treated/final/treated_human.hdf5, out/treated/final/treated_mouse.hdf5, out/treated/final/treated_human.h5ad, out/treated/final/treated_mouse.hdf5, out/treated/final/treated_mouse.h5ad, out/treated/final/treated_human_processed.h5ad, out/treated/final/treated_mouse_processed.h5ad, out/treated/qc/human_qc.png
    jobid: 0
    threads: 16

[Fri Sep 8 14:37:01 2023]
Finished job 0.
29 of 29 steps (100%) done

Output Directory Structure

├─ OUTDIR
│ ├── sample
│ │ ├── final.bam: Mapped, Sorted, Xenograft Processed alignments in bam format
│ │ ├── final
│ │ │ ├── tissue_positions_list.csv
│ │ │ ├── {sample}_counts.tsv.gz: Gene by Barcode umi count matrix generated by dropseq DigitalGeneExpression
│ │ │ ├── {sample}_human.h5ad:Spatial Anndata object
│ │ │ ├── {sample}_human.hdf5: 10X genomics formatted output structure (can be used for downstream spatial processing)
│ ├── xengsort: Xenograft sorted fastq files generated using xengsort toolkit
│ │ │ ├── ambiguous.fq.gz
│ │ │ ├── both.fq.gz
│ │ │ ├── graft.fq.gz
│ │ │ ├── host.fq.gz
│ │ │ ├── neither.fq.gz
│ ├── mapping
│ │ │ ├── STAR aligned genomes: Aligned files without Xenograft Processing
│ ├── logs
│ │ │ ├── Tagging and trimming logs
│ │ │ ├── STAR log files
│ │ │ ├── BAM subsetting logs
│ │ │ ├── Xengsort log
│ │ │ ├── Multi-map filtering logs
│ │ │ ├── Digital Gene Expression logs
│ ├── qc
│ │ │ ├── Xengsort summary files
│ │ │ ├── STAR summary files
│ │ │ ├── Dataset qc plots
│ │ │ ├── Digital Gene Expression summary
│ ├── preprocessing
│ │ │ ├── unaligned_bc_umi_tagged.bam: Unaligned bam file. Cell-barcode/UMI tagged and Adapter/PolyA trimmed using dropseq workflows

Downstream Cell-Cell Interactions

Intro

Using the outputs from Xenomake, we were able to identify group specific expression and differential signalling genes in our model. We provide scripts in the repository CellInteract to perform identical analyses that are present in the Xenomake Publication

Cell-Cell Signalling

Cell Signalling molecules such as chemokines/cytokines are simultaneously secreted by both stromal cell types and epithelial tumor cells in the tumor microenvironment. It is often difficult to attribute signalling expression to stromal or tumor cells in standard models. However in PDX models processed with means similar to Xenomake, we are able to study the differential production of cellular signals and identify biomarkers in stromal and epithelial compartments

Xenomake Publication Figure (Organism Specific Signalling)

Repository

Access further description as well as scripts to perform these downstream analysis: Link to CelllInteract Repository: https://github.com/bernard2012/CellInteract

QC

Scanpy QC Plots

Xengsort Metrics

Counts number of reads partitioned to each file

prefix	host	graft	ambiguous	both	neither
test	6652723	13654867	2603727	1297366	154218

## Running time statistics
- Running time in seconds: 225.0 sec
- Running time in minutes: 3.75 min
- Done. [2023-08-17 16:19:49]

STAR Metrics

                                 Started job on |       Aug 16 18:57:26
                             Started mapping on |       Aug 16 18:58:54
                                    Finished on |       Aug 16 19:04:22
       Mapping speed, Million of reads per hour |       968.48

                          Number of input reads |       88239353
                      Average input read length |       108
                                    UNIQUE READS:
                   Uniquely mapped reads number |       36755354
                        Uniquely mapped reads % |       41.65%
                          Average mapped length |       109.05
                       Number of splices: Total |       7555478
            Number of splices: Annotated (sjdb) |       7421066
                       Number of splices: GT/AG |       7487922
                       Number of splices: GC/AG |       23361
                       Number of splices: AT/AC |       1030
               Number of splices: Non-canonical |       43165
                      Mismatch rate per base, % |       1.75%
                         Deletion rate per base |       0.04%
                        Deletion average length |       1.90
                        Insertion rate per base |       0.04%
                       Insertion average length |       1.89
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       10085928
             % of reads mapped to multiple loci |       11.43%
        Number of reads mapped to too many loci |       768576
             % of reads mapped to too many loci |       0.87%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       38624549
                 % of reads unmapped: too short |       43.77%
                Number of reads unmapped: other |       2004946
                     % of reads unmapped: other |       2.27%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

UMI Metrics

Example of top 4 spots with highest expression

CELL_BARCODE	NUM_GENIC_READS	NUM_TRANSCRIPTS	NUM_GENES
CAGTTCCGCGGGTCGA	429279	67931	9287
GAAACTCGTGCGATGC	398339	59154	8748
AAGAGATGAATCGGTA	362017	57447	8723
CCAAGAAAGTGGGCGA	367515	54742	8306

Run Xenomake on Medulloblastoma Test Dataset

Follow Installation Instructions for Xenomake if not done already

Dataset

Paired End Fastq Files Downsamples to 8.4 million reads
Taken from: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-023-01185-4

Download Dataset

wget https://zenodo.org/records/10655403/files/sub1.fq.gz
wget https://zenodo.org/records/10655403/files/sub2.fq.gz

Run Xenomake Pipeline

Initialize Project

xenomake init \
	-r1 sub1.fq.gz \
	-r2 sub2.fq.gz \
	-s downsampled \
	-o test_run \
	--run_mode lenient \
	--spatial_mode visium \
	--download_genome True \
	--download_index True \
	--download_xengsort True

Species Setup

xenomake species \
	-hr GRCh38.p14.fna.gz \
	-ha GRCh38.p14.gtf \
	-mr GRCm39.fna.gz \
	-ma GRCm39.gtf \
	--human_index human_index/ \
	--mouse_index mouse_index/ \
	--xengsort_hash idx.hash \
	--xengsort_info idx.info

Run Project

xenomake run --cores 4

Optional Downloads

Download Dropseq

Current version of Dropseq-tools is 2.5.3 and is present at <repo_dir>/tools/Dropseq-2.5.3/
For updated versions check: https://github.com/broadinstitute/Drop-seq

Download Picard

Version Control

The provided picard.jar file is currently compatible with Java installed in the Xenomake conda environment.
Java: v11.0.15
Picard: v2.27.5

Picard Github

If you choose to use your own Picard executable, ensure compatibility with Java
https://broadinstitute.github.io/picard/

xenomake Release 1.0.9

Release 1.0.9 Toggle Dropdown 1.0.9 1.0.8 1.0.7 1.0.6 1.0.5 1.0.4 1.0.3 1.0.2 1.0.1 1.0.0

Documentation

About Xenomake

PDX Model

Spatial Integration

Pipeline

Publication

Installation

Create the conda environment and build dependencies

Running Xenomake

Quickstart

Quickstart: Initialize Xenomake:

Quickstart: Add Species Information:

Quickstart: Spatial Configuration:

Quickstart: Run Snakemake Pipeline

Memory Requirements and Runtimes

Step 1: Project Initialization:

Description

Required Flags:

Optional Flags:

Command Line Implementation:

Output Message

Example Configuration File

Step 2: Xenomake Species

Description

Required Flags:

Optional Flags:

Command Line Implementation:

Output Message

Output Directory Structure

Step 3: Xenomake Config (custom run)

Description

Spatial Flags

Run Flags

Command Line Implementation:

Step 4: Execute Project

Snakemake

Dry Run

Flags:

Example Snakemake Standard Outs

Output Directory Structure

Downstream Cell-Cell Interactions

Intro

Cell-Cell Signalling

Xenomake Publication Figure (Organism Specific Signalling)

Repository

QC

Scanpy QC Plots

Xengsort Metrics

STAR Metrics

UMI Metrics

Run Xenomake on Medulloblastoma Test Dataset

Dataset

Download Dataset

Run Xenomake Pipeline

Initialize Project

Species Setup

Run Project

Optional Downloads

Download Dropseq

Download Picard

Version Control

Picard Github

Stats

Development practices

Releases

Contributors

xenomake
Release 1.0.9

Release 1.0.9

1.0.9

1.0.8

1.0.7

1.0.6

1.0.5

1.0.4

1.0.3

1.0.2

1.0.1

1.0.0