shub-cli

A CLI at your hands to deal with the features of ScrapingHub.


Keywords
cli, crawler, scrapinghub, scrapinghub-api, scrapy
License
MIT
Install
pip install shub-cli==2.0.1

Documentation

Scrapinghub CLI

A Command Line Interface at your hands to deal with the features of ScrapingHub.

Code Health Build Status

start-using

Python Package Index

Install

You must install it through pip.

$ pip install shub-cli

Configuration

Shub CLI will look for the .scrapinghub.yml created by ScrapingHub in your home directory and read the default API_KEY and PROJECT_ID. If you do not have that file, set it up according to the example below:

~/.scrapinghub.yml

apikeys:
  default: <API_KEY>
projects:
  default: <PROJECT_ID>

Start

If you set up ~/.scrapinghub.yml file

$ shub-cli repl

Otherwise...

$ shub-cli -api <API KEY> -project <PROJECT_ID> repl

If you just want to run a command

$ shub-cli [credentials|spiders|job|jobs|schedule]

Cheatsheet

> credentials
> spiders
> job [-show|-cancel|-delete id]
> jobs [-spider spider] [-tag tag] [-lacks tag] [-state state] [-count count]
> schedule [-spider spider] [-tags tag1,tag2] [-priority 1|2|3|4]

Commands

Credentials

Check what credentials are being used to connect to Scrapinghub.

> credentials

Spiders

List all spiders available.

> spiders

Jobs

List the last 10 jobs or the ones according to your criteria.

> jobs
> jobs -spider <spider> -tag <tag> -lacks <lacks> -state <[pending,finished,running,deleted]> -count <[0,1000]>

Example:

> jobs
> jobs -spider example -tag production -lacks consumed -state finished -count 100

Attention: By default, shub-cli will prompt the last 10 jobs. To override that behaviour use the -count parameter with the number of jobs you intend to show.

Job

Show, delete or cancel a id.

> job -show <id>
> job -show <id> --with-log
> job -delete <id>
> job -cancel <id>

Example:

> job -show 11/23/19801
> job -show 11/23/19801 --with-log
> job -delete 11/23/19801
> job -cancel 11/23/19801

Schedule

Schedule a spider execution.

> schedule -spider <spider> -priority <[1,2,3,4]> -tags <tag1,tag2>

Example:

> schedule -spider my-spider
> schedule -spider my-spider -priority 4 -tags production,periodic
> schedule -spider my-spider -priority 3 -tags test

Help:

For help or suggestion please open an issue at the Github Issues page.