shopify-scrape

Python module to scrape Shopify store URLs


License
MIT
Install
pip install shopify-scrape==0.0.5

Documentation

shopify-scrape

CircleCI codecov

Installation

pip install shopify_scrape

Usage

Extracts json data for given URL. python -m shopify_scrape.extract url -h

usage: extract.py url [-h] [-d DEST_PATH] [-p PAGE_RANGE [PAGE_RANGE ...]]
                      [-c] [-f FILE_PATH]
                      url

positional arguments:
  url                   URL to extract. An attempt will be made to fix
                        inproperly formatted URLs.

optional arguments:
  -h, --help            show this help message and exit
  -d DEST_PATH, --dest_path DEST_PATH
                        Destination folder for extracted files. If
                        subdirectories present, they will be created if they
                        do not exist. Defaults to current directory './'
  -p PAGE_RANGE [PAGE_RANGE ...], --page_range PAGE_RANGE [PAGE_RANGE ...]
                        Inclusive page range as tuple to extract. There are 30
                        items per page. If not provided, all pages with
                        products will be taken.
  -c, --collections     If true, extracts '/collections.json' instead of
                        '/products.json'
  -f FILE_PATH, --file_path FILE_PATH
                        File path to write. Defaults to
                        '[dest_path]/[url].products' or
                        '[dest_path]/[url].collections'

Extracts json data for URLs given in a specified column of a csv file. python -m shopify_scrape.extract batch -h

usage: extract.py batch [-h] [-d DEST_PATH] [-p PAGE_RANGE [PAGE_RANGE ...]]
                        [-c] [-r ROW_RANGE [ROW_RANGE ...]] [-l [LOG]]
                        urls_file_path url_column

positional arguments:
  urls_file_path        File path of csv file containing URLs to extract.
  url_column            Name of unique column with URLs.

optional arguments:
  -h, --help            show this help message and exit
  -d DEST_PATH, --dest_path DEST_PATH
                        Destination folder for extracted files. If
                        subdirectories present, they will be created if they
                        do not exist. Defaults to current directory './'
  -p PAGE_RANGE [PAGE_RANGE ...], --page_range PAGE_RANGE [PAGE_RANGE ...]
                        Inclusive page range as tuple to extract. There are 30
                        items per page. If not provided, all pages with
                        products will be taken.
  -c, --collections     If true, extracts '/collections.json' instead of
                        '/products.json'
  -r ROW_RANGE [ROW_RANGE ...], --row_range ROW_RANGE [ROW_RANGE ...]
                        Inclusive row range specified as two integers. Should
                        be positive, with second argument greater or equal
                        than first.
  -l [LOG], --log [LOG]
                        File path of log file. If none, the log file is named
                        logs/[unix_time_in_seconds]_log.csv. 'logs' folder
                        created if it does not exist.