xpaw

Key Features

A web scraping framework used to crawl web pages
Data extraction tools used to extract structured data from web pages

Spider Example

以下是我们的一个爬虫类示例，其作用为爬取百度新闻的热点要闻:

from xpaw import Spider, HttpRequest, Selector, run_spider


class BaiduNewsSpider(Spider):
    def start_requests(self):
        yield HttpRequest("http://news.baidu.com/", callback=self.parse)

    def parse(self, response):
        selector = Selector(response.text)
        hot = selector.css("div.hotnews a").text
        self.log("Hot News:")
        for i in range(len(hot)):
            self.log("%s: %s", i + 1, hot[i])


if __name__ == '__main__':
    run_spider(BaiduNewsSpider)

在爬虫类中我们定义了一些方法：

start_requests: 返回爬虫初始请求。
parse: 处理请求得到的页面，这里借助 Selector 及CSS Selector语法提取到了我们所需的数据。

Documentation

http://xpaw.readthedocs.io/

xpaw
Release 0.10.0

Release 0.10.0

0.12.0

0.11.2

0.11.1

0.11.0

0.11.0b0

0.10.4

0.10.3

0.10.2

0.10.1

0.10.0

Documentation

xpaw

Key Features

Spider Example

Documentation

Stats

Development practices

Releases

Contributors

xpaw Release 0.10.0

Release 0.10.0 Toggle Dropdown 0.12.0 0.11.2 0.11.1 0.11.0 0.11.0b0 0.10.4 0.10.3 0.10.2 0.10.1 0.10.0

Documentation

xpaw

Key Features

Spider Example

Documentation

Stats

Development practices

Releases

Contributors

xpaw
Release 0.10.0

Release 0.10.0

0.12.0

0.11.2

0.11.1

0.11.0

0.11.0b0

0.10.4

0.10.3

0.10.2

0.10.1

0.10.0