Crawler: A simple asynchronous coroutine crawler structure.

Introduction

Crawler is a crawler structure for paging types, using asynchronous coroutines, which can efficiently and quickly crawl pages.

Create Project

    git clone https://github.com/Czw96/Crawler.git

Usage

main.py

Contains configuration information, and program launch entry.

    'entrance_urls'  # Entrance urls,
    'init_clean'     # Initial processing of the response function,
    'depth_clean'    # Depth processing of the response function,
    'header'         # Custom header(can be None),

init_clean.py

Initial processing of the response program. You need to return two lists, the first is the url list for the detail page, and the second is the list of download coroutine functions (can be None).

depth_clean.py

Depth processing of the response program. You need to return (can be not return) the list of download coroutine functions (can be None).

crawler parameter

If you want download an images or videos, you can use the crawler.download() function to wrap the information and return.

    crawler.download(url='', filename='')

resp parameter

Get the response returned by the request.

    resp.url     # Get url of the request.
    resp.status  # Get status code.
    resp.text()  # Get HTML text.
    resp.json()  # Get json.

Crawler-96
Release 1.1

Release 1.1

1.1

1.0

Documentation

Crawler: A simple asynchronous coroutine crawler structure.

Introduction

Create Project

Usage

main.py

init_clean.py

depth_clean.py

crawler parameter

resp parameter

Stats

Development practices

Releases

Contributors

Crawler-96 Release 1.1

Release 1.1 Toggle Dropdown 1.1 1.0

Documentation

Crawler: A simple asynchronous coroutine crawler structure.

Introduction

Create Project

Usage

main.py

init_clean.py

depth_clean.py

crawler parameter

resp parameter

Stats

Development practices

Releases

Contributors

Crawler-96
Release 1.1

Release 1.1

1.1

1.0