mindfactory-crawling

A crawler for mindfactory.de


Keywords
crawler, crawling, data, webcrawler, webcrawling
License
MIT
Install
pip install mindfactory-crawling==1.0.4

Documentation

Mindfactory.de Crawler

This repository contains a crawler for Mindfactory, a German ecommerce shop (for computer hardware). The crawler extracts the data contained on every single product page and stores the scraped products and reviews in a SQLite database consisting of two tables.

Each product has the following properties:

  • ID (SQLite identifier)
  • URL
  • Product name
  • Brand name
  • Category (i.e. CPU)
  • EAN
  • SKU
  • Items sold (Count)
  • People watching (Count)
  • RMA quote (in percent)
  • Average rating (from 1.0 to 5.0)
  • Shipping (information on availability)
  • Price (in Euro)

Additionally, for every product all reviews are collected and stored in a separate SQLite table. An entry in this table has the following properties:

  • Product ID (Reference to the corresponding ID in the product table)
  • Stars (Rating, from 1 to 5)
  • Text
  • Author
  • Date (YYYY-MM-DD)
  • Verified (actually bought the product at Mindfactory)

Prerequisites

  • Python3
  • scrapy
  • SQLite3

Run the scraper

scrapy crawl mindfactory_products