Rosetta
Rosetta is the knowledge map and service invocation tier of the Gamma reasoner.
Rosetta coordinates semantically annotated data sources into a metadata graph. That graph can be queried to generate programs to perform complex data retrieval tasks. It looks like this:
Blue nodes are semantic types from the biolink-model
Installation
Graph Database
Download, install, and start Neo4J 3.2.6.
$ <neo4j-install-dir>/bin/neo4j start
Cache
Download, install, and start Redis 4.0.8
<redis-install-dir>/src/redis-server
App
Clone the repository.
$ git clone <repo>
$ cd repo/greent
$ pip install -r requirements.txt
Initialize the type graph. This imports the graph of Translator services, overlays local service configurations, and imports locally defined services. It configures all of these according to the biolink-model.
$ PYTHONPATH=$PWD/.. rosetta.py --delete-type-graph --initialize-type-graph --debug
Via the Neo4J interface at http://localhost:7474/browser/ query the entire type graph:
match (m)--(n) return *
Query a particular path:
MATCH (n:named_thing)-[a]->(d:disease)-[b]->(g:gene) RETURN *
In the returned graph, nodes are biolink-model concepts and edges contain attributes indicating the service to invoke.
Web API
The web API presents two endpoints:
Clinical Outcome Pathway: /cop
Given a drug name and a disease name, it returns a knowledge graph of the clinical outcome pathway.
Edge:
Each edge includes:
- subj : A subject
- pred : A predicate indicating the relation of the subject to the object.
- obj : An object of the relation.
- pmids : One or more PubMed identifiers relevant to the statement.
Node:
Each node includes:
- id : A numeric identifier used as a link to edges in the same graph.
- identifier : A curie identifying an instance in an ontology.
- type : A biolink-model type for the object.
Query: /query
Given inputs and a Cypher query representing a shortest path between two concepts, generate a graph of items. More complex graphs can be composed by iteratively invoking this endpoint.
- inputs : A key value pair where the key is a biolink-model concept and the value is a comma separated list of curies. eg, concept=curie:id[,curie:id]
- query : A cypher query returning a path.
Python API
This simple snippet demonstrates usage via the Python API:
from greent.rosetta import Rosetta
rosetta = Rosetta ()
knowledge_graph = rosetta.construct_knowledge_graph(**{
"inputs" : {
"disease" : [
"DOID:2841"
]
},
"query" :
"""MATCH (a:disease),(b:gene), p = allShortestPaths((a)-[*]->(b))
WHERE NONE (r IN relationships(p) WHERE type(r) = 'UNKNOWN' OR r.op is null)
RETURN p"""
})
Caching
We cache in Redis. Objects are serialized using Python's pickle scheme.
List cache contents
To find out what operations(id) combinations are cached:
$ ~/app/redis-4.0.8/src/redis-cli --raw keys '*ctd*'
Delete specific keys
To delete specific keys or patterns of keys from the cache:
$ ~/app/redis-4.0.8/src/redis-cli --raw keys '*' | xargs ~/app/redis-4.0.8/src/redis-cli --raw del
Adding Edges
To add a data source to the knowledge map:
Build a Service
- Reuse or develop a smartAPI interface to your data.
- Publish a public network endpoint to the API if none exists.
- Register your smartAPI at the Translator Registry.
- For now, build a Python stub to your service. Soon, we hope to derive this information from the registry to invoke services programmatically. For an example stub, see the CTD service. This is a stub for this smartAPI endpoint.
Configure Service Endpoint
- Add your service endpoint URL to the configuration files following the CTD pattern.
- Add to greent.conf used for local development.
- And greent-dev.conf used in the continuous integration environment.
Instantiate The Service
Instantiate your service, following the lazy loading pattern, in core.py
Edit the Configuration
This YAML file links types in the biolink-model. Each link includes a predicate and the name of an operation. Operations are named:
<objectName>.<methodName>
where is a member of core.py, the central service manager.
- Find the "@operators" tag in the configuration file.
- Find the biolink-model element for the source type to your service.
- Follow the pattern in the configuration to enter your predicate (link) and operator (op)
Rebuild the Knowledge Map
$ PYTHONPATH=$PWD/.. rosetta.py --delete-type-graph --initialize-type-graph --debug
You should now be able to write cypher queries for Rosetta that use the biolink-model names specified in the rosetta.yml config file that are connected by your new service.