HCA DSS: The Human Cell Atlas Data Storage System
This repository contains design specs and prototypes for the replicated data storage system (aka the "blue box") of the Human Cell Atlas.
See the google drive folder for live collaborative documents.
About this prototype
The prototype in this repository uses Swagger to specify the API in dss-api.yml, and Connexion to map the API specification to its implementation in Python.
You can use the
Swagger Editor
to review and edit the prototype API specification. When the prototype app is running, the Swagger spec is also available at
/v1/swagger.json.
The prototype is deployed continuously from the master branch, with the resulting producer and consumer API available at
https://hca-dss.czi.technology/.
Installing dependencies for development on the prototype
The HCA DSS prototype development environment requires Python 3.4+ to run. Run pip install -r requirements-dev.txt in this directory.
Installing dependencies for the prototype
The HCA DSS prototype requires Python 3.4+ to run. Run pip install -r requirements.txt in this directory.
Running the prototype
Run ./dss-api in this directory.
Running tests
Run make test in this directory.
Some tests require the Elasticsearch service to be running on the local system.
Run: elasticsearch
Tests also use data from the data-bundle-examples subrepository.
Run: git submodule update --init
Configuring cloud-specific access credentials
AWS: Follow the instructions in
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html. Create an S3 bucket that you want DSS to
use. Set the environment variable DSS_S3_TEST_BUCKET.
GCE: Go to https://console.cloud.google.com/. Select the correct Google user account on the top right and the
correct GCE project in the drop down in the top center. Go to "IAM & Admin", then "Service accounts", then click "Create
service account" and select "Furnish a new private key". Create the account and download the service account key JSON
file. Run gcloud auth activate-service-account --key-file=/path/to/service-account.json. Run gcloud config set project 'PROJECT NAME'. Set the environment variable DSS_GCS_TEST_BUCKET.
Azure: Set the environment variables AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY.
CI/CD with Travis CI
We use Travis CI for continuous integration testing and
deployment. When make test succeeds, Travis CI deploys the application into the dev stage on AWS for every commit
that goes on the master branch. This behavior is defined in the deploy section of .travis.yml.
Authorizing Travis CI to deploy
Encrypted environment variables give Travis CI the AWS credentials needed to run the tests and deploy the app. Run
scripts/authorize_aws_deploy.sh IAM-PRINCIPAL-TYPE IAM-PRINCIPAL-NAME (e.g. authorize_aws_deploy.sh user hca-test)
to give that principal the permissions needed to deploy the app. Because this is a limited set of permissions, it does
not have write access to IAM. To set up the IAM policies for resources in your account that the app will use, run make deploy using privileged account credentials once from your workstation. After this is done, Travis CI will be able to
deploy on its own. You must repeat the make deploy step from a privileged account any time you change the IAM policies
in policy.json.template files.