dtool-lookup-api

This package offers both synchronous and asynchronous implementations of a standardized Python API to communicate with the dtool lookup server.


License
MIT
Install
pip install dtool-lookup-api==0.7.0

Documentation

README

PyPI GitHub tag (latest by date) GitHub Workflow Status

Python API for interacting with dtool lookup server.

This package offers a class-based asynchronous lookup API within dtool_lookup_api.core.LookupClient, a simple class-less wrapper around it at dtool_lookup_api.asynchronous, and a synchronous interface on top at dtool_lookup_api.synchronous.

Direct imports of utility functions from dtool_lookup_api in the examples below forward to the synchronous API variant.

Installation

To install the dtool_lookup_api package.

pip install dtool_lookup_api

This package depends on a dtool-lookup-server instance to talk to.

Configuration

The API needs to know the URL of the lookup server

export DTOOL_LOOKUP_SERVER_URL=https://localhost:5000

You may also need specify an access token generated on the server

export DTOOL_LOOKUP_SERVER_TOKEN=$(flask user token testuser)

Instead of specifying the access token directly, it is also possible to provide

export DTOOL_LOOKUP_SERVER_TOKEN_GENERATOR_URL=https://localhost:5001
export DTOOL_LOOKUP_SERVER_USERNAME=my-username
export DTOOL_LOOKUP_SERVER_PASSWORD=my-password

for the API to request a token. This, however, is intended only for testing purposes and strongly discouraged in a production environment, as your password would reside within environment variables or the dtool config file as clear text.

Our recommended setup is a combination of

export DTOOL_LOOKUP_SERVER_URL=https://localhost:5000
export DTOOL_LOOKUP_SERVER_TOKEN_GENERATOR_URL=https://localhost:5001

in the config. If used interactively, the API will then ask for your credentials at the first interaction and cache the provided values for this session, i.e.

In [1]: from dtool_lookup_api import query
   ...: res = query(
   ...:     {
   ...:         'readme.owners.name': {'$regex': '^Testing User$'},
   ...:     }
   ...: )
Authentication URL https://localhost:5001/token username:my-username
Authentication URL https://localhost:5001/token password:

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
  'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT',
  'creator_username': 'jotelha',
  'dtoolcore_version': '3.17.0',
  'frozen_at': 'Wed, 11 Nov 2020 17:20:30 GMT',
  'name': 'simple_test_dataset',
  'tags': [],
  'type': 'dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

In [3]: from dtool_lookup_api import all
   ...: all()
Out[4]:
[{'base_uri': 'smb://test-share',
  'created_at': 1604860720.736269,
  'creator_username': 'jotelha',
  'frozen_at': 1604921621.719575,
  'name': 'simple_test_dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Credentials caching and interactive prompting are turned off with

In [1]: import dtool_lookup_api.core.config
   ...: dtool_lookup_api.core.config.Config.interactive = False
   ...: dtool_lookup_api.core.config.Config.cache = False

In [2]: from dtool_lookup_api import all
   ...: all()
...
RuntimeError: Authentication failed

For testing purposes, it is possible to disable SSL certificates validation with

export DTOOL_LOOKUP_SERVER_VERIFY_SSL=false

As usual, these settings may be specified within the default dtool configuration file as well, i.e. at ~/.config/dtool/dtool.json

{
    "DTOOL_LOOKUP_SERVER_TOKEN_GENERATOR_URL": "https://localhost:5001/token",
    "DTOOL_LOOKUP_SERVER_URL": "https://localhost:5000"
}

List all datasets

To list all registered datasets

In [1]: from dtool_lookup_api import all
   ...: res = all()

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
'created_at': 1604860720.736269,
'creator_username': 'jotelha',
'frozen_at': 1604921621.719575,
'name': 'simple_test_dataset',
'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Looking up datasets by UUID

To lookup URIs from a dataset UUID within Python

In [1]: from dtool_lookup_api import lookup
   ...: uuid = "1a1f9fad-8589-413e-9602-5bbd66bfe675"
   ...: res = lookup(uuid)

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
  'created_at': 1604860720.736269,
  'creator_username': 'jotelha',
  'frozen_at': 1604921621.719575,
  'name': 'simple_test_dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Full text searching

Full text search for the word "test"

In [1]: from dtool_lookup_api import search
    ...: res = search("test")

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
  'created_at': 1604860720.736,
  'creator_username': 'jotelha',
  'dtoolcore_version': '3.17.0',
  'frozen_at': 1605027357.308,
  'name': 'simple_test_dataset',
  'tags': [],
  'type': 'dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Manifest

Request the manifest of a particular dataset by URI

In [1]: from dtool_lookup_api import manifest
   ...: uri = 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675'
   ...: res = manifest(uri)

In [2]: res
Out[2]:
{'dtoolcore_version': '3.17.0',
 'hash_function': 'md5sum_hexdigest',
 'items': {'eb58eb70ebcddf630feeea28834f5256c207edfd': {'hash': '2f7d9c3e0cfd47e8fcab0c12447b2bf0',
   'relpath': 'simple_text_file.txt',
   'size_in_bytes': 17,
   'utc_timestamp': 1605027357.284966}}}

Readme

Request the readme cotent of a particular dataset by URI

In [1]: from dtool_lookup_api import readme
    ..: res = readme('smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675')

In [2]: res
Out[2]:
{'creation_date': '2020-11-08',
'description': 'testing description',
'expiration_date': '2022-11-08',
'funders': [{'code': 'testing_code',
 'organization': 'testing_organization',
 'program': 'testing_program'}],
'owners': [{'email': 'testing@test.edu',
 'name': 'Testing User',
 'orcid': 'testing_orcid',
 'username': 'testing_user'}],
'project': 'testing project'}

Direct mongo language queries

To list all datasets at a certain base URI with their name matching some regular expression pattern, send a direct mongo language query to the server with

In [15]: from dtool_lookup_api import query
    ...: res = query(
    ...:     {
    ...:         'base_uri': 'smb://test-share',
    ...:         'name': {'$regex': 'test'},
    ...:     }
    ...: )

In [16]: res
Out[16]:
[{'base_uri': 'smb://test-share',
'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT',
'creator_username': 'jotelha',
'dtoolcore_version': '3.17.0',
'frozen_at': 'Tue, 10 Nov 2020 16:55:57 GMT',
'name': 'simple_test_dataset',
'tags': [],
'type': 'dataset',
'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

It is possible to search readme content via

In [21]: from dtool_lookup_api import query
    ...: res = query(
    ...:     {
    ...:         'readme.owners.name': {'$regex': '^Testing User$'},
    ...:     }
    ...: )

In [22]: res
Out[22]:
[{'base_uri': 'smb://test-share',
  'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT',
  'creator_username': 'jotelha',
  'dtoolcore_version': '3.17.0',
  'frozen_at': 'Tue, 10 Nov 2020 16:55:57 GMT',
  'name': 'simple_test_dataset',
  'tags': [],
  'type': 'dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

This requires the server-side dtool-lookup-server-direct-mongo-plugin.

TODO: Response from server-side direct mongo plugin still yields dates as strings. Fix within https://github.com/IMTEK-Simulation/dtool-lookup-server-direct-mongo-plugin.

Usage on Jupyter notebook

The current implementation via asgiref.async_to_sync (https://github.com/django/asgiref) hinders the use of the synchronous interface within Jupyter notebooks. Directly use the asynchronous api instead

import dtool_lookup_api.asynchronous as dl
res = await dl.query({
    'base_uri': 'smb://test-share',
    'name': {'$regex': 'test'},
})

The drawback of the above approach is that the same code doesn't work in python and in jupyter (await outsite of a function is a syntax error in non-interactive python context). The code below can be executed in both contexts:

import dtool_lookup_api.asynchronous as dl
if asyncio.get_event_loop().is_running():
    # then we are in jupyter notebook
    # this allows nested event loops, i.e. calls to asyncio.run inside the notebook as well
    # This way, the same code works in notebook and python
    import nest_asyncio
    nest_asyncio.apply()

def query(query_dict):
    return asyncio.run(dl.query(query_dict))

query({
    'base_uri': 'smb://test-share',
    'name': {'$regex': 'test'},
})

See jupyter/notebook#3397 (comment), https://ipython.readthedocs.io/en/stable/interactive/autoawait.html

Testing

Tests require the presence of a working dtool lookup server ecosystem. The testing workflow within .github/workflows/test.yml uses the dtool-lookup-server-container-composition to provide a mock ecosystem. It is possible to run the workflow locally with the help of docker and act.

After installing and configuring act, run

act -P ubuntu-latest=catthehacker/ubuntu:full-latest -s GITHUB_TOKEN=$GITHUB_TOKEN -W .github/workflows/test.yml --bind

from within this repository. $GITHUB_TOKEN must hold a valid access token. The user must be member of the docker group. The --bind option avoids quirky permission errors by running the test in the current directory. This will however result in the local creation of two subdirectories dtool-lookup-server-container-composition and workflow during testing, which may be removed with

rm -rf dtool-lookup-server-container-composition
sudo rm -rf workflow

eventually. All tests have been confirmed to work with the catthehacker/ubuntu:full-20.04 runner.