PyDecode is a dynamic programming toolkit developed for research in natural langauge processing. Its aim is to be simple enough for fast prototyping, but efficient enough for research use.
Features
-
Simple specifications. Dynamic programming algorithms specified through pseudo-code.
# Viterbi algorithm. ... c.init(items[0, :]) for i in range(1, n): for t in range(len(tags)): c.set(items[i, t], items[i-1, :], labels=labels[i, t, :]) graph = c.finish()
-
Efficient implementation. Core code in C++, python interfaces through numpy.
# Compute path. label_weights = numpy.random.random(graph.label_size) weights = pydecode.transform_label_array(graph, label_weights) path = pydecode.best_path(graph, weights)
-
High-level algorithms. Includes a set of widely-used algorithms.
# Inside probabilities. inside = pydecode.inside(graph, weights, kind=pydecode.LogProb) # (Max)-marginals. marginals = pydecode.marginals(graph, weights) # Pruning mask = marginals > threshold pruned_graph = pydecode.filter(graph, mask)
-
Integration with machine learning toolkits. Train structured models.
# Train a discriminative tagger. perceptron_tagger = StructuredPerceptron(tagger) perceptron_tagger.fit(X, Y) Y_test = perceptron_tagger.predict(X_test)
-
Visualization tools. IPython integrated tools for debugging and teaching.
pydecode.draw(graph, paths=paths)