vertigo

UNKNOWN


License
BSD-3-Clause
Install
pip install vertigo==0.1.3

Documentation

Vertigo: Some really simple graph tools travbadge

Vertigo is a small collection of classes and functions for building and working with graphs with labeled edges. This is useful because dictionaries are just graphs with labeled edges, and objects in Python are just dictionaries, so really this applies to pretty much all objects.

By convention, if you need to use many different features of vertigo and don't want to import them all into your namespace, you should rename the module to vg:

>>> import vertigo as vg

[![Build Status](https://travis-ci.org/dplepage/vertigo.svg?branch=master)](https://travis-ci.org/dplepage/vertigo)

Graphs

A vertigo GraphNode is, at its core, a combination of two things:

  • A value, which is any python object
  • A set of edges, which are unique string names pointing to other nodes

Many graphs you'll work with will be ``PlainGraphNode``s, which are the simplest implementation of the above - each simply stores its value and has an internal dictionary of its edges.

``PlainGraphNode``s can be constructed directly pretty easily:

>>> g1 = vg.PlainGraphNode("This is a value", edges=dict(
...     e1=vg.PlainGraphNode("This is another value"),
...     e2=vg.PlainGraphNode("This node has more children", edges=dict(
...         sub_edge=vg.PlainGraphNode(400),
...     )),
... ))

In the above graph, g1 has the value "This is a value" and two children, identified by the strings "e1" and "e2". The node along edge "e1" has the value "This is another value", while the node along edge "e2" has the value "This node has more chidlren", and itself has an edge labeled "sub_edge" going to a fourth node whose value is the integer 400.

Often, though, you'll use dynamic graph nodes, which implement the same GraphNode interface but wrap some other object. Dynamic nodes don't store their structure explicitly, but instead construct their edges and children on the fly as they're requested. Conceptually, the relationship between plain graphs and dynamic graphs is similar to the relationship between sequences and iterators - one is a static structure, the other is generated as needed, and as much as possible you should write your code so that it doesn't care about the distinction.

You can define your own dynamic node types simply by inheriting from GraphNode, making sure your subclass has a .value attribute (or property), and defining the two functions ._get_child(key), which should either return the node along the edge named key or raise a KeyError, and .key_iter(), which should return an iterable of strings all of which will work when passed into ._get_child.

See ObjectGraphNode and DictGraphNode in graph.py for examples of dynamic graphs wrapping objects; the InfiniteGraphNode type also demonstrates a dynamic graph that doesn't wrap an object.

In many cases you may have a dynamic graph but want to work with it as a static graph. In this case you can invoke the plain_copy function, which takes a graph and returns a new graph with identical structure and values to the input graph, but made entirely out of ``PlainGraphNode``s. This is mainly useful when the dynamic generation of your graph is slow and you expect to walk the graph multiple times - converting to a plain graph essentially caches the dynamic computation.

Don't use plain_copy on a graph with cycles or an InfiniteGraphNode, though, as it will run forever. You can use it on non-tree acyclic graphs, but nodes that appear twice in the same tree will be duplicated. In the future vertigo will support cycle- and duplicate-detection in a lot of these functions, but it doesn't yet.

Ordering

PlainGraphNode uses a collections.OrderedDict for its edges, so it will always preserve the order of edges. Bear in mind, though that if you create one by passing in a normal dict of edges then the order will be chosen arbitrarily (and nondeterministically in python 3.3 or later).

Usually you don't care, but when testing (doctesting in particular) it's often convenient to fix the order of graph chidlren. You can do this by making sure to use OrderedDict everywhere; there's also a GraphWrapper called SortedWrapper that behaves just like the wrapped graph but always returning keys in sorted order. There's more discussion of ``GraphWrapper``s later in this document.

ascii_tree

Vertigo provides lots of helpful functions, but one of the most useful for understanding (and debugging) graphs is ascii_tree. It is based on the asciitree module (https://pypi.python.org/pypi/asciitree), and prints the structure of graphs in a (fairly) human-readable way:

>>> print(vg.ascii_tree(g1, sort=True))
root: 'This is a value'
  +--e1: 'This is another value'
  +--e2: 'This node has more children'
     +--sub_edge: 400

The argument sort=True causes it to sort the edges before printing; in general you won't need this, but it's very useful in doctests to ensure consistent output. As mentioned under Ordering above, you can also do this with a SortedWrapper:

>>> print(vg.ascii_tree(vg.SortedWrapper(g1), sort=False))
root: 'This is a value'
  +--e1: 'This is another value'
  +--e2: 'This node has more children'
     +--sub_edge: 400

Building Graphs

From nested dictionaries

You can construct PlainGraphNode``s more succinctly using the helper fn ``from_dict. This takes a nested dictionary and turns it into a graph. Every dictionary key becomes an edge in the graph, except for the special key "_self", which indicates a node's value. If an edge points at another dictionary, the graph will be constructed recursively; if it points to a non- dictionary value v then it will be treated as {'_self':v}. For example, the above graph could also have been constructed via:

>>> g2 = vg.from_dict(dict(
...     _self = "This is a value",
...     e1 =  "This is another value",
...     e2 = dict(
...          _self = "This node has more children",
...          sub_edge = 400,
...     ),
... ))
>>> print(vg.ascii_tree(g2, sort=True))
root: 'This is a value'
  +--e1: 'This is another value'
  +--e2: 'This node has more children'
     +--sub_edge: 400

The inverse of from_dict is named to_dict. It has an optional argument minimize which, if true, uses as much shorthand as possible in the generation:

>>> vg.to_dict(g2, minimize=True) == dict(
...     _self = "This is a value",
...     e1 =  "This is another value",
...     e2 = dict(
...          _self = "This node has more children",
...          sub_edge = 400,
...     ),
... )
True

Note that without minimize=True, the e1 and e2/sub_edge values would have been dictionaries with the single key _self; see the docs for to_dict for more details.

From flat dictionaries

Another useful constructor is the from_flat, which takes a flat dictionary whose keys are paths. This is extremely useful for succinctly creating sparse graphs:

>>> g3 = vg.from_flat({
...     '': 'This is the root',
...     'foo': 'This is the value at "foo"',
...     'foo/bar/baz/qux/spam/fleem': 'Parents are created as needed.',
...     'x/a': 12,
...     'x/b': 14,
... })
>>> print(vg.ascii_tree(g3, sort=True))
root: 'This is the root'
  +--foo: 'This is the value at "foo"'
  |  +--bar: None
  |     +--baz: None
  |        +--qux: None
  |           +--spam: None
  |              +--fleem: 'Parents are created as needed.'
  +--x: None
     +--a: 12
     +--b: 14

The inverse of from_flat is unimaginatively named to_flat, and, like to_dict, has a minimize argument that produces the smallest dict that captures the structure:

>>> vg.to_flat(g3, minimize=True) == {
...     '': 'This is the root',
...     'foo': 'This is the value at "foo"',
...     'foo/bar/baz/qux/spam/fleem': 'Parents are created as needed.',
...     'x/a': 12,
...     'x/b': 14,
... }
True

Note that without minimize=True, the dictionary would include keys for all paths with None values as well, such as 'foo/bar' and x; see the docs for to_flat for more details.

Traversing Graphs

The GraphNode interface is reminiscent of a dict. You can get a specific child via .get_child(key) or by using __getitem__:

>>> g1.value
'This is a value'
>>> g1.get_child('e1').value
'This is another value'
>>> g1['e1'] is g1.get_child('e1')
True

__getitem__ also supports using key tuples for deeper lookup:

>>> g1['e2', 'sub_edge'] is g1['e2']['sub_edge']
True

You can iterate over the edge keys, the child nodes, or tuples of (key, child) using the iterator functions .key_iter(), .child_iter(), and .edge_iter(), respectively:

>>> lsl = lambda iter: list(sorted(list(iter))) # for reliable doctests
>>> lsl(g1.key_iter())
['e1', 'e2']
>>> set(g1.child_iter()) == {g1['e1'], g1['e2']}
True
>>> edges = lsl(g1.edge_iter())
>>> edges == [('e1', g1['e1']), ('e2', g1['e2'])]
True

Thus, to recurse over an entire graph structure, you might do:

>>> def print_graph(g, key='root', depth=0):
...     print('{}{}: {}'.format(' '*depth, key, g.value))
...     for key, child in lsl(g.edge_iter()):
...         print_graph(child, key, depth+1)
>>> print_graph(g1)
root: This is a value
 e1: This is another value
 e2: This node has more children
  sub_edge: 400

Graph Zipping

One very powerful feature of vertigo is the ability to combine graphs with similar structures. A common use case for this is labeling data - you have a data structure that you wish to display somehow, and you want to label its components, but you need different labels depending on the context. Then you might build a graph out of your data and merge it with a graph of labels to get labeled values that you could display.

You can zip two graphs together using the zip function, which takes two or more graphs and produce a new graph where the value at each node is a tuple of the corresponding values from the input graphs.

Zipping graphs requires you to provide a key merging function that determines the structure of the new graph. This function must take a list of graphs and return a list of keys that should be present in the new graph.

You can pass a function as the merge_fn argument of zip, or you can pass a string identifying one of the pre-made common functions. For example, the string 'first' identifies a merge function that simply returns the keys of the first graph in the list, resulting in a zipped graph that has the same structure as the first input graph.

See zip_fns.py for more information about the provided merging functions.

In addition to zip, there is also an izip function. izip returns a dynamic graph, generating zipped graph nodes as you request them; zip is simply plain_copy composed with izip.

The naming is meant to mimic python's zip vs izip, where izip returns an iterator while zip returns a list. Much like with those, vg.zip returns a concrete graph, but will fail if the graph it's operating on is infinite, while vg.izip may be doing more work as you traverse it but won't create nodes until you need them, allowing it to work on infinite structures.

More documentation

More docs coming soon. Here are some quick helpful notes:

A GraphWrapper is a dynamic graph that wraps another graph to change its behavior. For example, the MapWrapper has a function that it applies to all values in the graph, much like the imap function in python. See wrappers.py for more info.

A Walker is a quick way of defining a function that walks a graph doing processing. See walker.py for more info, but also know that a lot of the time you don't actually need it - just write a function that walks the graph directly, like the print_graph function defined earlier in this doc.

merge_fns.py has some helpers for merging graphs, where a "merge" is really just a zip followed by a map. It has some predefined mapping functions that are specifically useful when merging. For example, to overlay one graph on another, you zip them and then apply a map that replaces the tuple of values with the first non-None value in the tuple.

misc_fns.py defines some miscellaneous graph manipulators, like a function that takes two graphs and returns the subgraph of one of them matching the structure of the other. Look at the individual functions there to see what they do.