data-plumber-http

http extension for the data-plumber python framework


Keywords
data, flask, framework, http, pipeline, processing, unmarshalling, validation
License
MIT
Install
pip install data-plumber-http==1.0.0

Documentation

Tests PyPI - License GitHub top language PyPI - Python Version PyPI version PyPI - Wheel

data-plumber-http

This extension to the data-plumber-framework provides a mechanism to validate and unmarshal data in http-requests using a highly declarative format. If a problem occurs, a suitable status-code and message containing a brief description of the problem are generated automatically.

It offers

  • a minimalistic, highly intuitive, and declarative approach to request-validation/unmarshalling
  • configurability: automatically generated messages and status codes can be customized
  • extendability: custom data structures can be defined without much overhead
  • integration: decorator for a seamless integration with flask-web apps
  • high test-coverage

Contents

  1. Install
  2. Usage Example
  3. Migration from Previous Version
  4. Documentation
  5. Changelog

Install

Install using pip with

pip install data-plumber-http

Consider installing in a virtual environment.

Usage example

Consider a minimal flask-app implementing the /pet-POST endpoint of the Swagger Petstore - OpenAPI 3.0. A suitable unmarshalling-model may look like

from data_plumber_http import Property, Object, Array, String, Integer

pet_post = Object(
    properties={
        Property("name", required=True): String(),
        Property("photoUrls", name="photo_urls", required=True):
            Array(items=String()),
        Property("id", name="id_"): Integer(),
        Property("category"): Object(
            model=Category,
            properties={
                Property("id", name="id_", required=True): Integer(),
                Property("name", required=True): String(),
            }
        ),
        Property("tags"): Array(
            items=Object(
                model=Tag,
                properties={
                    Property("id", name="id_", required=True): Integer(),
                    Property("name", required=True): String(),
                }
            )
        ),
        Property("status"): String(enum=["available", "pending", "sold"]),
    }
)

Here, the arguments model=Category and model=Tag refer to separately defined python classes Category and Tag, i.e.

from typing import Optional
from dataclasses import dataclass

@dataclass
class Tag:
    id_: Optional[int] = None
    name: Optional[str] = None

@dataclass
class Category:
    id_: Optional[int] = None
    name: Optional[str] = None

In a flask app, this model can then be used as

from flask import Flask, Response
from data_plumber_http.decorators import flask_handler, flask_json

app = Flask(__name__)
@app.route("/pet", methods=["POST"])
@flask_handler(
    handler=pet_post.assemble(),
    json=flask_json
)
def pet(
    name: str,
    photo_urls: list[str],
    id_: Optional[int] = None,
    category: Optional[Category] = None,
    tags: Optional[list[Tag]] = None,
    status: Optional[str] = None
):
    return Response(
        f"OK: {name}, {photo_urls}, {id_}, {category}, {tags}, {status}",
        200
    )

Based on the example-request body given in the Pet Store API ({"id": 10, "name": "doggie", "category": {"id": 1, "name": "Dogs"}, "photoUrls": ["string"], "tags": [{"id": 0, "name": "string"}], "status": available"}), this app returns with

"OK: doggie, ['string'], 10, test_pet_post.<locals>.Category(id_=1, name='Dogs'), [test_pet_post.<locals>.Tag(id_=0, name='string')], available"

Migration from Previous Version

With the new major version 1, there are some minor breaking changes:

  • some import paths or class names have been changed:
    • _DPType has been moved to DPType
    • Responses has been moved to data_plumber_http.settings
  • Responses has been replaced by the singleton Responses() (view details here)
  • Property's constructor argument fill_with_none has been removed; the same behavior can be achieved by using the default argument, i.e. default=lambda **kwargs: None
  • all Number-type DPTypes got their range-argument replaced by the more granular options min_value, min_value_inclusive, max_value, and max_value_inclusive

Documentation

This section gives a brief overview of the features included in this package.

Contents

  1. Keys
    1. Property
    2. OneOf/AllOf
  2. Types
    1. Object, Array, String, ...
    2. Union Types
    3. Custom Types
  3. Decorators
  4. Response Configuration

Keys

A DPKey is used in conjuction with the properties-argument in the Object constructor.

Property

A Property is the simplest form for a DPKey. It specifies the field-related properties:

  • origin key name in the input JSON
  • name given name of the key generated from this Property (can be used to map JSON-names to python-names)
  • default either static value or callable taking Pipeline input kwargs (see data-plumber documentation); used as default if property is missing in request
  • required whether this property is required
  • validation_only skip exporting this property to the resulting data and only perform validation

OneOf and AllOf

These are conditional DPKeys which can be used to declare simple conditional structures within the properties-map of an Object. More complex relations are better processed in custom models or DPTypes. These conditional keys have the properties

  • name name identifier for this key (may be useful for debugging)
  • exclusive (OneOf only) whether exactly one match has to be made or multiple matches are allowed
  • default (see Property)
  • required (see Property)
  • validation_only (see Property)

A simple Object-handler that accepts either {"str": <string>, "bool": <boolean>} or {"int": <boolean>} may take the form of

Object(
    properties={
        OneOf("str&bool|int", exclusive=True): {
            AllOf("str&bool"): {
                Property("str"): String(),
                Property("bool"): Boolean()
            },
            Property("int"): Integer()
        }
    }
)

Note that in conditionally nested structures like in the example above, most properties of the inner DPKeys are silently ignored, i.e. all but origin and name. To have, for example, a default-value, it needs to be configured for the outermost DPKey (OneOf("str&bool|int", ...) in the example).

See also Union Types.

Types

Object

An Object corresponds to the JSON-type 'object' and is the base for any input handler-model. Calling assemble on an Object-instance returns a data-plumber-Pipeline. A Pipeline.run expects the keyword argument json, a dictionary containing the input data. The result of a run contains an Output-object in its data property (view data-plumber-documentation for details). This Output contains the kwargs (parsed and validated input) whereas in value the final result (dictionary or initialized model, if configured) is stored.

An Object's properties are

  • model data model (python class) for this Object or factory function (gets passed all generated kwargs of the associated Pipeline-run; the instance is then stored in data.value)
  • properties mapping for explicitly expected contents of this Object; this mapping is stored as the public property properties

Additionally, there are different options to configure how unknown properties in the input are treated. These are mutually exclusive:

  • additional_properties -- either boolean or field type
    • boolean: if True, ignore any additional fields; if False, rejects fields that are not listed in properties
    • type: required type specification for implicitly expected contents of this Object; if this type is set, all contents of the input which are not listed in properties have to satisfy the requirements imposed by that type; corresponding fields in json are added to the output
  • accept_only -- list of accepted field names; if set, on execution a json is rejected if it contains a key that is not in accept_only
  • free_form -- whether to accept and include any content that has not been defined explicitly via properties

Array

An Array corresponds to the JSON-type 'array'. Its properties are

  • items type specification for items of this Array; if None, instead of performing a validation, all JSON-contents are added to the output ("free-form array")

String

A String corresponds to the JSON-type 'string'. Its properties are

  • pattern regex-pattern that the value of this field has to match
  • enum list of allowed values for this field

Boolean

A Boolean corresponds to the JSON-type 'boolean'.

Integer/Float/Number

The types Integer, Float, and Number (the latter corresponding to the JSON-type 'number') represent numbers (integers, floating point numbers, and either of those, respectively). Their properties are

  • values list of values allowed in this field
  • min_value, min_value_invlusive, max_value, max_value_inclusive configuration for accepted value ranges

Null

The Null-type represents a JSON-'null' and generates a None value in python.

Any

The Any-type can be used to indicate a field to be of free form. Any regular JSON-type (Array (free-form), Boolean, Float, Integer, Null, Object (free-form), and String) is accepted here.

Uri/Url

The types Uri and Url can be used to declare fields that are required to have a uri- or url-format. Their properties are

  • schemes list of strings that are accepted as schemes (omit for accepting any)
  • require_authority (Uri only) whether to require a non-empty authority-section
  • require_netloc (Url only) whether to require a non-empty netloc-section
  • return_parsed whether to return a string or named tuple (result from a call to urllib.parse.urlparse)

FileSystemObject

The FileSystemObject-type implements a rudimentary validation logic for references to objects within a file system. Properties are

  • cwd override the process's cwd; the input is appended to this Path prior to validation
  • relative_to make call to pathlib.Path.relative_to prior to validation
  • exists, is_file, is_dir, is_fifo collection of validation options; any omitted value is skipped during validation

Union Types

Types can be combined freely by using the |-operator. A type specification of Boolean() | String(), for example, accepts either a boolean- or a string-value.

Custom Types

When using this extension, custom types can be defined easily by inheriting from an existing DPType or, at a lower level, from their common interface data_plumber_http.DPType itself and

  • defining the TYPE-property (python class) as well as
  • implementing the make-method. As a simple example for this, consider the following type-definition for a string-type that is required to be prefixed with some string:
from typing import Any

from data_plumber_http import DPType
from data_plumber_http.settings import Responses

class PrefixedString(DPType):
    TYPE = str
    def __init__(self, prefix: str):
        self._prefix = prefix
    def make(self, json, loc: str) -> tuple[Any, str, int]:
        if not json.startswith(self._prefix):
            return (
                None,
                Responses().BAD_VALUE.msg.format(
                    origin=json,
                    loc=loc,
                    expected="a prefix of " + self._prefix
                ),
                Responses().BAD_VALUE.status
            )
        return (
            self.TYPE(json),
            Responses().GOOD.msg,
            Responses().GOOD.status
        )

This type can then, for example, be used as

Object(
    properties={Property("string"): PrefixedString(prefix="my-prefix:")}
)

Running the assembled Pipeline with a json-keyword argument (Object(..).assemble().run(json={"string": ...})) of {"string": "my-prefix: hello"} returns a good status but {"string": "missing-prefix: hello"} is rejected.

Decorators

This package provides a factory for decorators which allow to seamlessly integrate the validation and unmarshalling of input data with flask view-functions. See the example given in the section Usage Example. The decorators-subpackage defines (aside from the decorator-factory flask_handler) shortcuts for collecting request data as json-input:

  • flask_args: request.args
  • flask_form: request.form
  • flask_files: request.files
  • flask_values: request.values
  • flask_json: request.json

Response Configuration

The status-codes and messages used by data-plumber-http are defined in the class data_plumber_http.settings.Responses. By modifying the respective (singleton) object, the status codes (or messages) can be easily altered to one's individual requirements.

from data_plumber_http.settings import Responses

Responses().update("BAD_VALUE", status = 406)

You can also register new response types which can then be used in custom DPTypes

Responses().new(
    "DELETED",
    msg="Resource '{json}' requested in '{loc}' has been permanently deleted.",
    status = 410
)
...
class MyResource(DPType):
    TYPE = str
    ...
    def make(self, json, loc):
        ...
        return (
            None,
            Responses().DELETED.msg.format(
                json=json,
                loc=loc
            ),
            Responses().DELETED.status
        )

Note that changing the status codes of pre-defined responses into a different range (e.g. 4XX- to 2XX-range) can break the extension's functionality. Corresponding warnings can be disabled by changing the warn_on_change property of Responses().

Response-Usage Map

data-plumber-http ships with the following set of Responses:

Response status used in case of
GOOD 0 input is valid
MISSING_OPTIONAL 1 missing optional field
UNKNOWN_PROPERTY 400 additional field (not allowed)
MISSING_REQUIRED 400 missing required field
BAD_TYPE 422 input exists but has wrong type
BAD_VALUE 422 input exists, has correct type, but value is not allowed (e.g. String(enum=[...]) where input is not in enum)
RESOURCE_NOT_FOUND 404 input references a non-existing/unavailable resource
BAD_RESOURCE 422 input references resource that exists, but its properties differ from expectation (e.g. directory for FileSystemObject(is_file=True))
CONFLICT 409 input references resource that does already/does not exist (e.g. directory for FileSystemObject(is_dir=True))
MISSING_REQUIRED_ONEOF 400 missing required field within a OneOf(required=True)
BAD_VALUE_IN_ONEOF - see BAD_VALUE; status and message are inherited
MULTIPLE_ONEOF 400 ambiguous matching situation for a key OneOf(exclusive=True)
MISSING_REQUIRED_ALLOF 400 missing field within an AllOf(required=True)
BAD_VALUE_IN_ALLOF - see BAD_VALUE; status and message are inherited