cooper-pair

A small library that provides programmatic access to Superconductive's GraphQL API.


License
Apache-2.0
Install
pip install cooper-pair==0.0.4

Documentation

pair

cooper_pair is a Python library that provides programmatic access to Superconductive's GraphQL API.

It supports a limited number of common use cases. (See below.) cooper_pair is not intended as a general-purpose integration library for GraphQL. Most useful GraphQL queries are not supported within the cooper_pair API.

Why limit the use cases?

GraphQL is a composable query language. The space of allowed queries is enormous, and developers are empowered to choose the right query for a given job. This de-couples development behind the API from development that consumes the API, and allows each to move faster, independently.

Wrapping a flexible GraphQL API in a rigid python library would completely defeat that purpose.

Instead, think of cooper_pair as training wheels. It makes it easy to quickly connect to GraphQL, and perform a few common functions. It also provides a collection of example queries to learn how to use GraphQL and the Allotrope API.

In other words, cooper-pair can help you get started, but you will be able to get far more out of Allotrope once you learn to query it natively using GraphQL.

Installation

cd cooper-pair
pip install .

Or,

pip install git+ssh://git@github.com/superconductive/cooper.git#egg=cooper_pair&subdirectory=pair

Usage

Instantiate the API

from cooper_pair import CooperPair

pair = CooperPair(
    graphql_endpoint="http://my-data-valet-url:3010/graphql",
    email='my_user@some_email.com',
    password='my_very_secure_password'
)

List datasets

response = pair.list_datasets()
print( json.dumps(response, indent=2))

Get a dataset

response = pair.get_dataset("RGF2YXNldPoxODl=")
print( json.dumps(response, indent=2))

List checkpoints

response = pair.list_checkpoints()
print( json.dumps(response, indent=2) )

Create a new dataset and evaluate it against an existing checkpoint

From a dataframe:

my_df = pd.DataFrame({
    "x" : [1,2,3,4,5],
    "y" : [6,7,8,9,10],
})
response = pair.evaluate_checkpoint_on_pandas_df(
    checkpoint_id="Q2hlY2twb2ludDox",
    pandas_df=my_df,
    filename="my_dataframe_name"
)
evaluation_id = response['addEvaluation']['evaluation']['id']
dataset_id = response['addEvaluation']['evaluation']['dataset']['id']

From a file:

with open('my_file.csv', 'rb') as fd:
    dataset = pair.evaluate_checkpoint_on_file(
        checkpoint_id="Q2hlY2twb2ludDox",
        fd=fd,
    )
evaluation_id = response['addEvaluation']['evaluation']['id']
dataset_id = response['addEvaluation']['evaluation']['dataset']['id']

Note: Evaluation is asynchronous. When the response first comes back from Allotrope, it will have status="created". This will change to pending when a worker picks it up, then to success or failed depending on the result of the evaluation.

You can query for status as follows:

response = pair.query("""
        query evaluationQuery($id: ID!) {
            evaluation(id: $id) {
                id,
                status
            }
        }
    """,
    variables={
        'id' : evaluation_id
})
print(response)

Creating a new checkpoint from JSON

import json
with open('checkpoint_definition.json', 'rb') as fd:
    checkpoint_config = json.load(fd)

pair.add_checkpoint_from_expectations_config(
    checkpoint_config, "Checkpoint Name")