herman

a little fork of peewee


License
MIT
Install
pip install herman==1.2.5

Documentation

Herman

According to Charles, the Peewee ORM is currently going through a major 3.0 rewrite and isn't accepting many (any?) new features for the 2.x line. Because of this, I made Herman as a light fork of Peewee (2.8.1) implementing some syntactic sugar and minor tweaks I've liked from other ORMs.

This is mostly for my company's internal use, but anyone else is free to use it. I'm maintaining API compatibility as a drop in replacement for peewee 2.x.

https://api.travis-ci.org/keredson/peewee.png?branch=master

How To Install

sudo pip uninstall peewee
sudo pip install herman

Or to upgrade:

sudo pip install herman --upgrade

There is no `herman` package. Herman represents itself as peewee (with the same peewee __version__) as a drop-in replacement. To check if you're running Peewee or Herman, check for peewee.__herman_version__.

Differences (Herman vs. Peewee)

SQL Generation

The SQL generated by an ORM isn't just an artifact to be consumed by the database. Developers need to be able to read it as well. The more readable it is and the closer it is to what one would naturally write, the less time it takes to manually evaluate / debug it.

For example, this query:

Article.select()
.join(Person, join_type=pw.JOIN.LEFT_OUTER, on=(Article.author==Person.id))
.where(Article.author==derek)

In Peewee generates this:

SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id"
FROM "article" AS t1
LEFT OUTER JOIN "person" AS t2 ON ("t1"."author_id" = "t2"."id")
WHERE ("t1"."author_id" = ?

Whereas Herman generates:

SELECT a1.id, a1.title, a1.author_id, a1.editor_id
FROM article AS a1
LEFT OUTER JOIN person AS p1 ON (a1.author_id = p1.id)
WHERE (a1.author_id = ?)

You'll notice that in Herman:

  • Table and column identifiers in Herman aren't unnecessarily quoted.
  • Table alias names are derived from their table names, not just "t1..tN".

The first() Method Uses a Limit

Article.select().first()

In Peewee generates this:

SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" FROM "article" AS t1

Where in Herman this is generated:

SELECT a1.id, a1.title, a1.author_id, a1.editor_id FROM article AS a1 LIMIT 1

Notice the additional "LIMIT 1" in Herman. Peewee 2.8.1 rather loads entire result set and returns this first object. (It's my understanding future versions of Peewee will also add a "LIMIT 1".)

Named Table Aliases

In Peewee you can create table aliases as a variable outside your query. For example:

author_table = Person.alias()
Article.select()
.join(author_table)
.where(author_table.name=='Derek')

This gets you a very respectable:

SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id"
FROM "article" AS t1
INNER JOIN "person" AS t2 ON ("t1"."author_id" = "t2"."id")
WHERE ("t2"."name" = ?)

But in some queries it's very useful to specify what the aliases are. In Herman you can do this:

author_table = Person.alias('author')
Article.select()
.join(author_table)
.where(author_table.name=='Derek')

Which generates this:

SELECT a1.id, a1.title, a1.author_id, a1.editor_id
FROM article AS a1
INNER JOIN person AS author ON (a1.author_id = author.id)
WHERE (author."name" = ?)

These two statements are equivalent in Herman:

author_table = Person.alias('author')
author_table = Person.as_('author')

And because aliases are named, you're no longer required to use the external variable:

Article.select()
.join(Person.as_('author'))
.where(Person.as_('author').name=='Derek')

Whether in-lining the aliases makes it easier or harder to read is entirely dependent on the query and the code block it's in, but it's good to have the option. Both are supported in Herman.

Alias References are Inferred When Unambiguous

If a query is otherwise invalid and there is only one possible interpretation of the query author's intent, automatically tie the un-aliased column to the appropriate alias. For example:

Person.alias('xyz').select().where(Person.name == 'Derek')

Generates invalid SQL in Peewee (which is run on the database, which throws an exception):

SELECT "t1"."id", "t1"."name", "t1"."parent_id"
FROM "person" AS t1 WHERE ("t2"."name" = ?)
peewee.OperationalError: no such column: t2.name

But in Herman it's valid:

SELECT xyz.id, xyz."name", xyz.parent_id
FROM person AS xyz WHERE (xyz."name" = ?)

This would NOT work if more than one Person alias were included in the query.

Herman Raises Exceptions When Invalid Columns Are Referenced

Peewee will generate and run on the database SQL it knows is invalid. For example:

Person.select().where(Article.title == 'xyz').first()

In Peewee will throw a database error:

SELECT "t1"."id", "t1"."name", "t1"."parent_id"
FROM "person" AS t1 WHERE ("t2"."title" = ?)
peewee.OperationalError: no such column: t2.title

This isn't good for two reasons. First, I don't like relying on the database to catch easily detectable errors for us. Second, the error is opaque and specific to the internal implementation details of peewee (the "t2").

Herman on the other hand will not generate the SQL at all, instead raising:

peewee.ProgrammingError: <class '__main__.Article'> is not a part of this query

The get() Method Confirms Uniqueness

The get() method in Peewee adds a "LIMIT 1" and returns the first object. I feel this is incorrect behavior. The difference between first() and get() is get() should assert that only one matching record exists. (This is something Django got right IMO.) If my query criteria hasn't correctly isolated a unique row the ORM should throw an exception.

This is why Herman added a "LIMIT 2" to the SQL genned from get(), and does a check on the number of object returned. For example:

derek = Person.create(name='Derek')
callie = Person.create(name='Callie')
Person.select().get()

Will throw:

peewee.DataError: Too many instances matching query exist:
SQL: SELECT p1.id, p1."name", p1.parent_id FROM person AS p1 LIMIT 2

Rather than returning a random Person object selected by the database.

The Shortcut ALL

In Herman, this:

Person.ALL

Is the same as this:

Person.select()

It's just a nomenclature I preferred from Django. I made it uppercase to prevent conflicts with any columns named "all", and to highlight that it's effectively a constant.

A New (Additional) Join Syntax

If I have a set of models:

class Person(pw.Model):
  name = pw.CharField()

class Article(pw.Model):
  title = pw.CharField()
  author = pw.ForeignKeyField(db_column='author_id', rel_model=Person, to_field='id')
  editor = pw.ForeignKeyField(db_column='editor_id', rel_model=Person, to_field='id', related_name='edited_articles', null=True)

class Reply(pw.Model):
  text = pw.CharField()
  article = pw.ForeignKeyField(db_column='article_id', rel_model=Article, to_field='id')

And I want to do something fancy like get all replies with their articles and authors and editors, in Peewee I have to do something like this:

author_table = Person.alias()
editor_table = Person.alias()
replies = Reply.select(Reply, author_table, editor_table)
  .join(Article)
  .join(author_table, join_type=pw.JOIN.LEFT_OUTER, on=(author_table==Article.author))
  .switch(Article)
  .join(editor_table, join_type=pw.JOIN.LEFT_OUTER, on=(editor_table==Article.editor))
  .where(author_table.name=="Derek")

Which is all sorts of complicated. For instance:

  • I have to mentally keep track of what the join context is and manipulate it with the switch statement.
  • Because Article has two FKs to the same table, I have to manually specify the on conditionals.
  • Because an editor can be null, the default INNER JOIN will implicitly filter out replies to articles without editors, which is rarely what the developer wants when asking for a list of replies, so I have to use "join_type=pw.JOIN.LEFT_OUTER" a lot.

Herman offers a simpler syntax:

Reply.ALL
  .plus(Reply.article, Article.author.as_('author'))
  .plus(Reply.article, Article.editor)
  .where(Person.as_('author').name=="Derek")

Which generates the same SQL:

SELECT r1.id, r1."text", r1.article_id, a1.id, a1.title, a1.author_id, a1.editor_id,
       author.id, author."name", author.parent_id, p1.id, p1."name", p1.parent_id
FROM reply AS r1
LEFT OUTER JOIN article AS a1 ON (r1.article_id = a1.id)
LEFT OUTER JOIN person AS author ON (a1.author_id = author.id)
LEFT OUTER JOIN person AS p1 ON (a1.editor_id = p1.id)
WHERE (author."name" = ?)

The plus() method takes a variable number of ForeignKeyField objects which represent a path away from the primary query object (Reply in this case). For example:

Reply.ALL.plus(Reply.article)

Gets all the replies with their associated articles.

Reply.ALL.plus(Reply.article, Article.editor)

Gets all the replies with their associated articles and all the articles editors. Note that "Reply.article" is a foreign key from Reply to Article, and Article.editor is a foreign key from Article to Person. The list of foreign keys must create a logical path where the "to" type of one FK matches the "from" type of the next FK.

Following the same path twice is harmless. For instance:

Reply.ALL.plus(Reply.article).plus(Reply.article)

Is no different than calling plus() once. This is why we can specify multiple paths that have some overlap, for example:

Reply.ALL
  .plus(Reply.article, Article.author)
  .plus(Reply.article, Article.editor)

Which will join on the article table only once.

You can also alias your joined tables with:

Article.author.as_('author')

Which allows you to reference it later in your conditional:

.where(Person.as_('author').name=="Derek")

Herman's plus() also supports following foreign keys from one-to-many. Like:

Article.ALL.plus(Reply.article)

Which internally does a prefetch to populate the article with all of its replies. There will be O(k) SQL statements executed, where k is the number of two-many relationships. All of these queries will be grouped into one transaction to guarantee correctness.

IMPORTANT:

Remember that foreign keys represent edges in your object graph, and a call to plus(*edges) tells Herman to include that path from the object graph in your query.

This semantics for plus() have been co-opted from the DKO project, which I authored for my former employer. DKO's version of this syntax has been in broad production use since 2010 by hundreds of developers, accessing some of the largest (billions of rows) conventional relational databases that exist.

Calling len() Does a Database Count

If you call:

len(Article.select())

Peewee this will load a list of all objects, permanently cache said list, and then call len() on that cache.

In Herman this will call count() on the database and return the resulting integer. It does not build the list of objects in python nor cache anything. However, for backwards compatability, if something else has already populated the cached results of the query, it will call len() on that.

A New DeferredRelation Syntax

The semantics behind Peewee's circular foreign key dependencies get kind of unwieldy when you have more than a few models (and they're spread over multiple files). This is because the DeferredRelation object has to be defined, used, then the other model defined in another file, then set_model has to be called on the original, and then you're left with the object reference dangling around that has no purpose. IE the example in the docs:

# Create a reference object to stand in for our as-yet-undefined Tweet model.
DeferredTweet = DeferredRelation()

class User(Model):
  username = CharField()
  # Tweet has not been defined yet so use the deferred reference.
  favorite_tweet = ForeignKeyField(DeferredTweet, null=True)

class Tweet(Model):
  message = TextField()
  user = ForeignKeyField(User, related_name='tweets')

# Now that Tweet is defined, we can initialize the reference.
DeferredTweet.set_model(Tweet)

Ours happens all in the model definition with an optional parameter given to DeferredRelation. Like:

class User(Model):
  username = CharField()
  # Tweet has not been defined yet so use the deferred reference.
  favorite_tweet = ForeignKeyField(DeferredRelation('Tweet'), null=True)

class Tweet(Model):
  message = TextField()
  user = ForeignKeyField(User, related_name='tweets')

This removes the need for the extra variable in the global namespace and the coordination of it over multiple files. And since the parameter is optional, it is fully backwards-compatible with the old syntax.

Our patch for this has been incorporated upstream, so this is forwards-compatible too, following Peewee's next release.

Passing an Empty List/Set/Tuple into IN Doesn't Gen Invalid SQL

If you try to do a IN operation on an empty list:

User.select().where(User.id << [])

Peewee will generate the following SQL:

SELECT "t1"."id", "t1"."username" FROM "user" AS t1 WHERE ("t1"."id" IN ())

Which the database will reject as invalid, throwing an exception. We instead generate a "false" statement:

SELECT u1.id, u1.username FROM "user" AS u1 WHERE (0 = 1)

So you don't have to manually test for empty lists every time you use a SQL IN.

peewee

Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use.

New to peewee? Here is a list of documents you might find most helpful when getting started:

For flask helpers, check out the flask_utils extension module. You can also use peewee with the popular extension flask-admin to provide a Django-like admin interface for managing peewee models.

Examples

Defining models is similar to Django or SQLAlchemy:

from peewee import *
from playhouse.sqlite_ext import SqliteExtDatabase
import datetime

db = SqliteExtDatabase('my_database.db')

class BaseModel(Model):
    class Meta:
        database = db

class User(BaseModel):
    username = CharField(unique=True)

class Tweet(BaseModel):
    user = ForeignKeyField(User, related_name='tweets')
    message = TextField()
    created_date = DateTimeField(default=datetime.datetime.now)
    is_published = BooleanField(default=True)

Connect to the database and create tables:

db.connect()
db.create_tables([User, Tweet])

Create a few rows:

charlie = User.create(username='charlie')
huey = User(username='huey')
huey.save()

# No need to set `is_published` or `created_date` since they
# will just use the default values we specified.
Tweet.create(user=charlie, message='My first tweet')

Queries are expressive and composable:

# A simple query selecting a user.
User.get(User.username == 'charles')

# Get tweets created by one of several users. The "<<" operator
# corresponds to the SQL "IN" operator.
usernames = ['charlie', 'huey', 'mickey']
users = User.select().where(User.username << usernames)
tweets = Tweet.select().where(Tweet.user << users)

# We could accomplish the same using a JOIN:
tweets = (Tweet
          .select()
          .join(User)
          .where(User.username << usernames))

# How many tweets were published today?
tweets_today = (Tweet
                .select()
                .where(
                    (Tweet.created_date >= datetime.date.today()) &
                    (Tweet.is_published == True))
                .count())

# Paginate the user table and show me page 3 (users 41-60).
User.select().order_by(User.username).paginate(3, 20)

# Order users by the number of tweets they've created:
tweet_ct = fn.Count(Tweet.id)
users = (User
         .select(User, tweet_ct.alias('ct'))
         .join(Tweet, JOIN.LEFT_OUTER)
         .group_by(User)
         .order_by(tweet_ct.desc()))

# Do an atomic update
Counter.update(count=Counter.count + 1).where(
    Counter.url == request.url)

Check out the example app for a working Twitter-clone website written with Flask.

Learning more

Check the documentation for more examples.

Specific question? Come hang out in the #peewee channel on irc.freenode.net, or post to the mailing list, http://groups.google.com/group/peewee-orm . If you would like to report a bug, create a new issue on GitHub.

Still want more info?

http://media.charlesleifer.com/blog/photos/wat.jpg

I've written a number of blog posts about building applications and web-services with peewee (and usually Flask). If you'd like to see some real-life applications that use peewee, the following resources may be useful: