Pgpipeline: automatic postgres pipeline for Scrapy

A Scrapy pipeline module to persist items to a postgres table automatically.

Quick Start

Here's an example showing automatic item pipeline, with a custom JSONB field.

# settings.py
from sqlalchemy.dialects.postgresql import JSONB

ITEM_PIPELINES = {
    'pgpipeline.PgPipeline': 300,
}

PG_PIPELINE = {
    'connection': 'postgresql://localhost:5432/scrapy_db',
    'table_name': 'demo_items',
    'pkey': 'item_id',
    'ignore_identical': ['item_id', 'job_id'],
    'types': {
        'some_data': JSONB
    },
    'onconflict': 'upsert'
}

All columns, tables, and indices are automatically created.

pkey: a primary key for this item (other than database-generated id)
ignore_identical: these are a set of fields by which we identify duplicates and skip insert.
types: keys specified here will be using the type given, otherwise types are guessed.
onconflict: upsert|ignore|non-null - ignore will skip inserting on conflict and upsert will update. non-null will upsert only values that are not None and thus avoid removing existing values.

Developers

Set up a development environment

$ pip install -r requirements.txt

Development

Dependencies: list them in requirements.txt

Release

Dependencies: list them in setup.py under install_requires:

install_requires=['peppercorn'],

Then:

$ make dist && make release

Contributing

Fork, implement, add tests, pull request, get my everlasting thanks and a respectable place here :).

Thanks:

To all Contributors - you make this happen, thanks!

pgpipeline
Release 0.2.0

Release 0.2.0

0.4.0

0.3.0

0.2.0

0.1.0

Documentation

Pgpipeline: automatic postgres pipeline for Scrapy

Quick Start

Developers

Development

Release

Contributing

Thanks:

Copyright

Stats

Development practices

Releases

Contributors

pgpipeline Release 0.2.0

Release 0.2.0 Toggle Dropdown 0.4.0 0.3.0 0.2.0 0.1.0

Documentation

Pgpipeline: automatic postgres pipeline for Scrapy

Quick Start

Developers

Development

Release

Contributing

Thanks:

Copyright

Stats

Development practices

Releases

Contributors

pgpipeline
Release 0.2.0

Release 0.2.0

0.4.0

0.3.0

0.2.0

0.1.0