scrapy-kafka

Kafka-based components for Scrapy


License
BSD-3-Clause
Install
pip install scrapy-kafka==0.1.1

Documentation

scrapy-kafka

Kafka-based components for Scrapy. There are 2 components:

  • A custom Spider that waits for URLs to crawl via a Kafka topic. When there are no more messages to read for the topic, the Spider just stays idle.
  • A custom ItemPipeline component that stores a JSON-ified Item back into another Kafka topic.

Please see the example directory for how to use this.

Contributors

Contributors to scrapy-kafka, listed alphabetically: