descriptors

A collection of descriptors for validating input data.


License
Other
Install
pip install descriptors==0.1.0

Documentation

descriptors - a package for easy input validation

tl;dr: A collection of descriptors for easily validating attributes like so:

from descriptors import Validated, ExistingPath, MadePath

class BatchProcessor(Validated):
    input_dir = ExistingPath()
    output_dir = MadePath()

    def process(self, input_dir, output_dir):
        self.input_dir = input_dir  # error if input_dir doesn't exist
        self.output_dir = output_dir  # output_dir automatically created

Contents

Usage Examples

Reference

Why use descriptors?

Caveats

Installation

Credit

Usage examples

Validation of builtin types:

from descriptors import Validated, Str, Dict, Int

class A(Validated):
    some_str = Str()
    some_dict = Dict()
    some_int = Int()

a = A()
a.some_str = "test"  # ok
a.some_str = 7  # ValueError
a.some_dict = {"two": "dos"}  #ok
a.some_int = "test"  # ValueError

Validation of numeric ranges:

from descriptors import Validated, GreaterThan, InRange, NotZero

class A(Validated):
    many = GreaterThan(2)
    num_players = InRange(1, 4)
    divisor = NotZero()

a = A()
a.many = 2  # ValueError
a.num_players = 3  # ok
a.divisor = 0  # ValueError

Some more examples:

import os
from descriptors import (
    Validated, ExistingPath, Apply, Length, NotNone, RegexMatch, Satisfies)

class A(Validated):
    abs_path = ExistingPath() & Apply(os.path.abspath)
    triple = Length(3)
    something = NotNone()
    only_a = RegexMatch("^a+$")
    lower_str = Apply(str.lower)
    even = Satisfies(lambda x: x % 2 == 0)

a = A()
os.chdir("/home/username")
a.abs_path = ".."  # a.abs_path = "/home"
a.triple = [1, 2, 3]  # ok
a.something = None  # ValueError
a.only_a = "aaaab"  # ValueError
a.lower_str = "sOmE StrIng"  # a.lower_str == "some string"
a.even = 1  # ValueError

Descriptors can be composed using the bitwise AND and OR operators (i.e. & and |):

from descriptors import (
    Validated, Int, Satisfies, HasAttr, GreaterThan, LessThan)

class A(Validated):
    an_even_int = Int() & Satisfies(lambda x: x % 2 == 0)
    a_finite_set = HasAttr("__contains__") & HasAttr("__len__")
    no_single_digits = Int() & (GreaterThan(9) | LessThan(-9))

a = A()
a.an_even_int = 2.0  # ValueError
a.a_finite_set = [1, 2, 3]  # ok
a.no_single_digits = 7  # ValueError

Inheriting from Validated means that class attributes with an assigned Descriptor will be validated, but nothing else. Other class attributes behave as usual, and assigning a Descriptor to an instance variable will not have the desired effect:

from descriptors import Validated, HasAttr

class A(Validated):
    f = HasAttr("read")  # assigning to class attribute
    g = 0  # normal class attribute, not validated

    def __init__(self):
        self.h = HasAttr("read")  # h not validated, you can assign anything

a = A()
a.f = 7  # ValueError
a.h = 7  # assigns 7, no ValueError

Reference

A complete list of all descriptors provided can be found here.

Why use descriptors?

If you have ever written a program that takes user input, you have probably written code to make sure that user input is what your program expects it to be. For example, say you're writing a tool to batch-process files in an input directory and save them to an output directory. You'll want to verify that input_dir exists and show the user a non-cryptic error message if it doesn't. You'll also want to make sure output_dir exists or can be created, so the program doesn't process files for possibly hours just to fail saving the results because output_dir contains an invalid character:

class BatchProcessor(object):
    def process(self, input_dir, output_dir):
        if os.path.exists(input_dir)
            self.input_dir = input_dir
        else:
            self.some_error_msg()
        if not os.path.exists(output_dir):
            try:
                os.makedirs(outputdir)
                self.output_dir = output_dir
            except OSError:
                self.another_error_msg()

This is tedious to write and maintain, and doesn't prevent setting invalid values somewhere else in the program. A better approach is using properties:

class BatchProcessor(object):
    @property
    def input_dir(self):
        return self._input_dir

    @input_dir.setter:
    def input_dir(self, val):
        if os.path.exists(val)
            self._input_dir = val
        else:
            self.some_error_msg()

    # output_dir property left as an exercise for the reader

Properties make your intentions much clearer and prevent setting invalid values, but now there are getters and setters all over the place. Also, reusing properties from one class in another isn't exactly convenient. A much better solution is using descriptors. A descriptor is an object that hooks into attribute access by implementing any of the following methods: __get__, __set__, __delete__. By assigning a descriptor to a class attribute, that descriptor's __get__, __set__, or __delete__ method will be invoked when the attribute is retrieved, set, or deleted. With descriptors, our example program looks like this:

from descriptors import Validated, ExistingPath, MadePath

class BatchProcessor(Validated):
    input_dir = ExistingPath()
    output_dir = MadePath()

    def process(self, input_dir, output_dir):
        self.input_dir = input_dir
        self.output_dir = output_dir

As the name suggests, ExistingPath only allows existing paths to be set; it will raise an exception otherwise. MadePath will create the path, if necessary, and raise an exception if the path cannot be created. By inheriting from Validated, class attributes will automatically be bound to their descriptor.

(The last sentence is completely false, but the actual reason for inheriting from Validated is a bit more complicated.)

Handle with care

The purpose of this package is to conveniently validate input data. This convenience comes at the price of performance. While the performance hit is completely negligible for most reasonable use cases (e.g., setting a couple of parameters before running the main part of your program), using descriptors in a long-running, CPU-intensive loop, or some other heavy-duty part of your program will likely cause a significant drop in performance.

Descriptors are not a static type system. Do a couple of sanity checks so users don't have to deal with error messages from deep down your program, where the actual failure would occur otherwise. It's probably not a good idea to go overboard by using descriptors on every single attribute or being overly restrictive with what values you allow.

Installation

This package has been tested on Python 2.7 and Python 3.3.

Installation using pip (depending on your system you might have to run this as root):

pip install descriptors

Uninstall:

pip uninstall descriptors

Installation without pip:

Download the latest zip archive of this package from pypi:

http://pypi.python.org/pypi/descriptors/

Extract the archive, navigate to the extracted folder and run:

python setup.py install

Other validation options

If descriptors is not what you're looking for, check out these projects:

Credit

This package is inspired by (read: shamelessly stolen from) David Beazley's excellent tutorial on Python 3 metaprogramming (video, slides and code).