hv

postgresql connection manager for scalability freaks


License
MIT
Install
pip install hv==0.1.1

Documentation

hypervault

pgsql connection manager for scalability freaks.

pip install hv

This is the implementation of the pattern described by Instagram in "Sharding & IDs at Instagram" article.

Besides for that, it wraps over psycopg2 with custom connection pooling support and stores dicts in hstore k/v pairs which may be indexed by PostgreSQL.

api

hv.entity.Key(id)

An instance of the Key represents a unique key (64-bits long) for an entity and has the following attributes:

created returns the UTC datetime corresponding to the first 41-bits of the numeric id.

shard_id holds a 13-bits integer and represents the logical shard.

added_id is the remaining 10-bits and represents an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond.

hv.datastore.Datastore(connections, pool_max, pool_block_timeout, logger)

connections holds an array of dicts which are being passed to psycopg2 respectively. However those dicts should also contain a special shards value which adds meaning to all that fuss going around.

This example shows the bare minimum you need to create a Datastore instance:

connections = [
  dict(shards='1-9', host='192.168.2.23', port='5432', user='x', password='x', database='x'),
  dict(shards='9-17', host='192.168.2.24', port='5432', user='x', password='x', database='x')
]
db = Datastore(connections)

In this case, we assume PostgreSQL running on 192.168.2.23 contains shards (schemas) starting from 1 to 9 (9 not included) and on 192.168.2.24 we have shards from 9 to 17.

pool_max is the maximum number of psycopg2 connections that are going to be kept alive for every connection we have passed. (default: 10)

pool_block_timeout is the maximum number of seconds to wait for getting a connection from pool before the request is dropped. (default: 5)

logger should hold a Logger object if you want to use LoggingConnection. By default every connection is an instance of DictConnection.

A Datastore instance has the following methods:

get_connection(shard_id)

Returns a psycopg2 connection for the given shard_id.

Beware that this connection should be sent back into the pool when you are finished, or otherwise you know- universe will collapse and Trinity will die :(

put_connection(connection)

Sends connection back into the pool where it belonged.

cursor(shard_id)

Returns a context manager delivering a connection for the given shard_id.

This is a convenience method that saves you from forgetting to call put_connection.

Example:

with db.cursor(5) as cur:
  cur.execute('SELECT version()')
  ver = cur.fetchone()

put(shard_id, kind, **kwargs)

Writes data to the specified shard, where kind is an integer which is not stored within hstore field and used for differentiating between entity types.

Returns a hv.entity.Key.

Example:

data = dict(beep='boop')
key = db.put(12, 1, **data)

get(key)

Fetches the data with the given key.

key must be of type hv.entity.Key.

Example:

key = Key(307821103844175873)
res = db.get(key)

disconnect()

Closes every connection in every pool.

reinstantiate()

Reinstantiates connection pools. Make sure you have closed every connection before calling this method.

license

mit