scrapy-rethinkdb

Scrapy pipeline for rethinkdb.


Keywords
scrapy, pipeline, rethinkdb
License
Other
Install
pip install scrapy-rethinkdb==0.0.4

Documentation

scrapy-rethinkdb


Build Status

Pipeline to insert documents to a RethinkDB table.

The minimal configuration requires defining the pipeline in the item pipelines and a table name.

ITEM_PIPELINES = [
  'scrapy_rethinkdb.RethinkDBPipeline',
]

RETHINKDB_TABLE = 'items'

The connection will be created with the defaults of RethinkDB's python driver (see http://www.rethinkdb.com/api/python/connect/) unless is overwritten as in the following example to use the database called scrapydb:

RETHINKDB_CONNECTION = {
    'db': 'scrapydb'
}

The insertion will be done with the default options of RethinkDB's python driver (see http://www.rethinkdb.com/api/python/insert/) unless is overwritten as in the following example to insert or update the item:

RETHINKDB_INSERT_OPTIONS = {
    'upsert': False
}

The document inserted by default is the dictionary from the item. This can be changed by extending the pipeline and overriding the method get_document.

If any additional behaviour is needed before/after the insertion, such as changing the item, dropping it, etc, this can be done by extending the pipeline and overriding the method(s) before_insert/after_insert.