Crawler: A simple asynchronous coroutine crawler structure.
Crawler is a crawler structure for paging types, using asynchronous coroutines, which can efficiently and quickly crawl pages.
git clone https://github.com/Czw96/Crawler.git
main.py
Contains configuration information, and program launch entry.
'entrance_urls' # Entrance urls,
'init_clean' # Initial processing of the response function,
'depth_clean' # Depth processing of the response function,
'header' # Custom header(can be None),
init_clean.py
Initial processing of the response program. You need to return two lists, the first is the url list for the detail page, and the second is the list of download coroutine functions (can be None).
depth_clean.py
Depth processing of the response program. You need to return (can be not return) the list of download coroutine functions (can be None).
crawler parameter
If you want download an images or videos, you can use the crawler.download()
function to wrap the information and return.
crawler.download(url='', filename='')
resp parameter
Get the response returned by the request.
resp.url # Get url of the request.
resp.status # Get status code.
resp.text() # Get HTML text.
resp.json() # Get json.