Python Web Robot Builder

The main idea of py-robot is to simplify the code, and improve the performance of web crawlers.

Install

pip install ciag-robot

Bellow we have a simple example of crawler that needs to get a page, and for each specific item get another page. Because it was written without the use of async requests, it will make a request and make the another one only when the previous has finished.

# examples/iot_eetimes.py

import requests
import json

from lxml import html
from pyquery.pyquery import PyQuery as pq

page = requests.get('https://iot.eetimes.com/')
dom = pq(html.fromstring(page.content.decode()))

result = []
for link in dom.find('.theiaStickySidebar ul li'):
    news = {
        'category': pq(link).find('span').text(),
        'url': pq(link).find('a[href]').attr('href'),
    }
    news_page = requests.get(news['url'])
    dom = pq(news_page.content.decode())
    news['body'] = dom.find('p').text()
    news['title'] = dom.find('h1.post-title').text()
    result.append(news)

print(json.dumps(result, indent=4))

We can rewrite that using py-robot, and it will look like that:

# examples/iot_eetimes2.py

import json
from robot import Robot
from robot.collector.shortcut import *
import logging

logging.basicConfig(level=logging.DEBUG)

collector = pipe(
    const('https://iot.eetimes.com/'),
    get(),
    css('.theiaStickySidebar ul li'),
    foreach(dict(
        pipe(
            css('a[href]'), attr('href'), any(),
            get(),
            dict(
                body=pipe(css('p'), as_text()),
                title=pipe(css('h1.post-title'), as_text()),
            )
        ),
        category=pipe(css('span'), as_text()),
        url=pipe(css('a[href]'), attr('href'), any(), url())
    ))
)

with Robot() as robot:
    result = robot.sync_run(collector)
print(json.dumps(result, indent=4))

Now all the requests will be async, so it will start all the requests for each item at the same time, and it will improve the performance of the crawler.

ciag-robot
Release 0.4.dev1697464011

Release 0.4.dev1697464011

0.4.dev1697464011

0.4.dev1644352361

0.4.dev1610168567

0.4.dev1610153303

0.4.dev1610152548

0.4.dev1610151867

0.4.dev1610151595

0.4.dev1610040836

0.3.0

0.3.dev1610040110

Documentation

Python Web Robot Builder

Install

Intro

Stats

Releases

Contributors

ciag-robot Release 0.4.dev1697464011

Release 0.4.dev1697464011 Toggle Dropdown 0.4.dev1697464011 0.4.dev1644352361 0.4.dev1610168567 0.4.dev1610153303 0.4.dev1610152548 0.4.dev1610151867 0.4.dev1610151595 0.4.dev1610040836 0.3.0 0.3.dev1610040110

Documentation

Python Web Robot Builder

Install

Intro

Stats

Releases

Contributors

ciag-robot
Release 0.4.dev1697464011

Release 0.4.dev1697464011

0.4.dev1697464011

0.4.dev1644352361

0.4.dev1610168567

0.4.dev1610153303

0.4.dev1610152548

0.4.dev1610151867

0.4.dev1610151595

0.4.dev1610040836

0.3.0

0.3.dev1610040110