Async-scrape

Perform webscrape asyncronously

Async-scrape is a package which uses asyncio and aiohttp to scrape websites and has useful features built in.

Features

Breaks - pause scraping when a website blocks your requests consistently
Rate limit - slow down scraping to prevent being blocked

Installation

Async-scrape requires C++ Build tools v15+ to run.

pip install async-scrape

How to use it

#Create an instance
from async_scrape import AsyncScrape

def post_process(html, resp, **kwargs):
    """Function to process the gathered response from the request"""
    if resp.status == 200:
        return "Request worked"
    else:
        return "Request failed"

async_Scrape = AsyncScrape(
    post_process_func=post_process,
    post_process_kwargs={},
    fetch_error_handler=None,
    use_proxy=False,
    proxy=None,
    pac_url=None,
    acceptable_error_limit=100,
    attempt_limit=5,
    rest_between_attempts=True,
    rest_wait=60
)

urls = [
    "https://www.google.com",
    "https://www.bing.com",
]

resps = async_Scrape.scrape_all(urls)

Response object is a list of dicts in the format:

{
    "url":url, #url of request
    "func_resp":func_resp, #response from post processing function
    "status":resp.status, #http status
    "error":None #any error encountered
}

License

MIT

Free Software, Hell Yeah!

async-scrape
Release 0.1.18

Release 0.1.18

0.1.18

0.1.17

0.1.16

0.1.15

0.1.14

0.1.13

0.1.12

0.1.11

0.1.10

0.1.9

Documentation

Async-scrape

Perform webscrape asyncronously

Features

Installation

How to use it

License

Stats

Development practices

Releases

Contributors

async-scrape Release 0.1.18

Release 0.1.18 Toggle Dropdown 0.1.18 0.1.17 0.1.16 0.1.15 0.1.14 0.1.13 0.1.12 0.1.11 0.1.10 0.1.9

Documentation

Async-scrape

Perform webscrape asyncronously

Features

Installation

How to use it

License

Stats

Development practices

Releases

Contributors

async-scrape
Release 0.1.18

Release 0.1.18

0.1.18

0.1.17

0.1.16

0.1.15

0.1.14

0.1.13

0.1.12

0.1.11

0.1.10

0.1.9