Shareable Project in 100 Words
Once upon a time, a youngster wanted to share his data science projects with others, who wanted to apply them on new data or extend the projects. Each project required specific environemnt, multi-language dependencies, and big data to run, so he helped each collaborator to get these dependencies. "Why is it so hard to share a project?" he wondered. He went back to the lab and told his friends. "Let’s standardize data science projects, " he said. "Let’s also make a tool for people to easily manage and share their projects, " said his friend. Eureka! Shareable Project was born.
Specification
Shareable Project is a directory containing the following:
code/
Your code goes here.
environment/
Conda environments go here. environment/essence
is the base environment. spro
sorts any additional environment by file name and stack it on environment/essence
.
input/
The input for your code goes here.
output/
Anything produced by your code goes here.
stuff/
Everything else like logo, picture, video, or paper goes here.
project.json
This stores the project metadata. Here is an example with the required keys and their example values.
{
"name": "Compare Capybara and Human",
"git_url": "https://github.com/KwatME/compare-capybara-and-human",
"version": "1.0.0",
"keyword": [
"Capybara",
"Human",
"Genomics"
],
"download": {
"input/genome": [
"ftp://kwatme.com/capybara_genome.fasta",
"ftp://kwatme.com/human_genome.fasta"
],
"code": [
"ftp://kwatme.com/plot.js"
],
"stuff/sound": [
"ftp://kwatme.com/capybara_sound.mp3"
]
},
"command": {
"compare": "python code/compare.py",
"plot": "npm code/plot.js"
}
}
name
This is the project name. Try to capitalize the first letter of each word.
git_url
This is the project GitHub URL. A project must be a GitHub repository for sharing.
version
This is the project version. It should be something like #.#.#.
keyword
This is a list of words describing the project. These words can help others find the project.
download
This is a mapping of directory path (in respect to the project directory path) to URLs. spro build
downloads the content from each URL into its corresponding directory path. In the example, spro build
downloads four files: 1) input/genome/capybara_genome.fasta; 2) input/genome/human_genome.fasta; 3) code/plot.js; and 4) stuff/sound/capybara_sound.mp3.
command
This is a mapping of command key to command. spro run command_key
runs the command registered to command_key (from the project directory path). In the example, spro run plot
runs npm code/plot.js
.
Extra Key
Add any user parameter as an extra key-value pair. Anyone should be able to figure out how the project is configured and used by looking only at project.json
.
Install
spro
needs conda
.
conda
Install wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh &&
bash Miniconda3-latest-Linux-x86_64.sh
# Start new session
spro
Install pip install spro
Use
Make a project and enter its environment
spro create project
cd project
spro enter
A new bash session should open. $PATH
should have only 1) /.../environment/essence/bin
; 2) /.../condabin
; and 3+) the default bin
paths.
Omics App
Run angit clone https://github.com/Guardiome/muscle_type
cd muscle_type
spro download
spro enter
# Within environment
spro run omics_app
Model and Infer
Rungit clone https://github.com/KwatME/model_and_infer
cd model_and_infer
spro enter
# Within environment
spro run notebook
Shareable Project powered by Guardiome