A utility for packaging objects and validating metadata for FAIRSCAPE


Keywords
fairscape, reproducibility, FAIR, B2AI, CLI, RO-Crate
License
Other
Install
pip install fairscape-cli==0.2.0

Documentation

fairscape-cli

A utility for packaging objects and validating metadata for FAIRSCAPE.


Features

fairscape-cli provides a Command Line Interface (CLI) that allows the client side to create:

  • RO-Crate - a light-weight approach to packaging research data with their metadata. The CLI allows users to:
    • Create Research Object Crates (RO-Crates)
    • Add (transfer) digital objects to the RO-Crate
    • Register metadata of the objects
    • Describe the schema of tabular dataset objects as metadata and perform validation.

Requirements

Python 3.8+

Installation

$ pip install fairscape-cli

Minimal example

Basic commands

  • Show all commands, arguments, and options
$ fairscape-cli --help
  • Create an RO-Crate in a specified directory
$ fairscape-cli rocrate create \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"
  • Create an RO-Crate in the current working directory
$ fairscape-cli rocrate init \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS"
  • Add a dataset to the RO-Crate
$ fairscape-cli rocrate add dataset \
  --name "AP-MS embeddings" \
  --author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
  --version "1.0" \
  --date-published "2021-04-23" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --data-format "CSV" \
  --source-filepath "./tests/data/APMS_embedding_MUSIC.csv" \
  --destination-filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
  "./test_rocrate"
  • Add a software to the RO-Crate
$ fairscape-cli rocrate add software \
  --name "calibrate pairwise distance" \
  --author "Qin, Y." \
  --version "1.0" \
  --description "script written in python to calibrate pairwise distance." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --file-format "py" \
  --source-filepath "./tests/data/calibrate_pairwise_distance.py" \
  --destination-filepath "./test_rocrate/calibrate_pairwise_distance.py" \
  --date-modified "2021-04-23" \
  "./test_rocrate"
  • Register a computation to the RO-Crate
$ fairscape-cli rocrate register computation \
  --name "calibrate pairwise distance" \
  --run-by "Qin, Y." \
  --date-created "2021-05-23" \
  --description "Average the predicted proximities" \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"
  • Create a schema
$ fairscape-cli schema create-tabular \
    --name 'APMS Embedding Schema' \
    --description 'Tabular format for APMS music embeddings from PPI networks from the music pipeline from the B2AI Cellmaps for AI project' \
    --separator ',' \
    --header False \
    ./schema_apms_music_embedding.json
  • Add a string property
$ fairscape-cli schema add-property string \
    --name 'Experiment Identifier' \
    --index 0 \
    --description 'Identifier for the APMS experiment responsible for generating the raw PPI used to create this embedding vector' \
    --pattern '^APMS_[0-9]*$' \
    ./schema_apms_music_embedding.json
  • Add annother string property
$ fairscape-cli schema add-property string \
    --name 'Gene Symbol' \
    --index 1 \
    --description 'Gene Symbol for the APMS bait protien' \
    --pattern '^[A-Za-z0-9\-]*$' \
    --value-url 'http://edamontology.org/data_1026' \
    ./schema_apms_music_embedding.json
  • Add an array property
$ fairscape-cli schema add-property array \
    --name 'MUSIC APMS Embedding' \
    --index '2::' \
    --description 'Embedding Vector values for genes determined by running node2vec on APMS PPI networks. Vector has 1024 values for each bait protien' \
    --items-datatype 'number' \
    --unique-items False \
    --min-items 1024 \
    --max-items 1024 \
    ./schema_apms_music_embedding.json
  • Show a successful validation of the schema against the dataset
$ fairscape-cli schema validate \
    --data ./examples/schemas/MUSIC_embedding/APMS_embedding_MUSIC.csv  \
    --schema ./examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
  • Show an unsuccessful validation of the schema against the dataset
$ fairscape-cli schema validate \
    --data examples/schemas/MUSIC_embedding/APMS_embedding_corrupted.csv \
    --schema examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
  • Validate using default schemas
# validate imageloader files
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/imageloader/samplescopy.csv" \
        --schema "ark:59852/schema-cm4ai-imageloader-samplescopy" 
    
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/imageloader/uniquecopy.csv" \
        --schema "ark:59852/schema-cm4ai-imageloader-uniquecopy"
       
# validate image embedding outputs
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/image_embedding/image_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-image-embedding-image-emd"
     
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/image_embedding/labels_prob.tsv" \
        --schema "ark:59852/schema-cm4ai-image-embedding-labels-prob"

# validate apsm loader input
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_gene_node_attributes.tsv" \
        --schema "ark:59852/schema-cm4ai-apmsloader-gene-node-attributes"

$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_edgelist.tsv" \
        --schema "ark:59852/schema-cm4ai-apmsloader-ppi-edgelist"

# validate apms embedding 
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apms_embedding/ppi_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-apms-embedding"    

# validate coembedding 
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/coembedding/coembedding_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-coembedding"

Contribution

If you'd like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.

License

This project is licensed under the terms of the MIT license.