QuickSpider

Thid is a samll spider framework which based on nodes and graph built with these nodes.

The goal of this samll framework is to build a quick spider.

安装

使用 pip

pip install quickspider

如何使用？

自定义你的节点。（或使用其他人写的节点、内置节点）
使用节点构建爬取图。（配置 toml 文件）
运行。

TODO

当前版本的样例

使用默认模板创建图。

quickspidercommand create --template default

直接运行默认图。

quickspidercommand run --file deafult.toml

说明

谁需要它？

quickspider是一个类似玩具的小框架，如果你需要大量，并发，稳定的爬取，那么请使用scrapy；如果你需要对付一些刁钻的小网站，那么请使用requests、httpx、beautifulsoup等库自定义你的爬虫。

那么，quickspider存在的意义是什么呢？quickspider旨在协同你完成两件事：

快速的开发一个常规的，数据量不是那么大的小爬虫；
验证你在开发大型或更灵活的爬虫时的idea。

所以请不要将其作为专业的爬虫工具。

这个东西是怎么实现的？

quickspider的想法非常简单，quickspider中包含各种各样的节点，这些节点可能是你自己编写的（请参考编写手册），也可能是别人编写的（请参考这个人写的帮助文档），也可能是内置的（请参考内置节点说明）。使用toml文档声明，配置这些节点，并将这些节点构造成一棵树，之后即可使用这个树指导quickspider进行爬取。

爬取的本质是数据信息的变换，而具体执行变换的，就是节点。由节点构成的树便形成了一个数据变换流。

这个toml究竟在说啥？

[nodes] 

[nodes.url]
type = "PageNode"
input = "http://quotes.toscrape.com/page/{page}/"
start = 1
stop = 6

[nodes.Geter]
type = "GetNode"

[nodes]声明了节点集合。

[nodes.NodeName]声明了具体的节点，NodeName无关紧要，仅用于方便开发者；前缀nodes.不可省略。

[nodes.NodeName]下面的key = value则指明了该节点的属性。

其中type为每个节点必不可少的属性，该属性声明了该节点的种类；余下的属性则根据节点种类的不同而有所区别。

比如[nodes.url]:

该节点的名称为url，类别为PageNode。这是个内置的节点类别，该节点的作用为将input以page为关键字，以[start, stop)作为区间，作一个格式化处理。其中还有一个参数为step，默认为1，因此可以省去。
比如[nodes.Geter]:

该节点的名称为Geter，类别为GetNode。这也是个内置的节点类别。该节点会将URL转换为Response。

有哪些内置节点呢？

Constructor

PageNode

Web

GetNode
```
[nodes.example]
type = "GetNode"
```

PostNode

[nodes.example]
type = "PostNode"
data(optional) = {}

Parser

ParserNode

[nodes.example]
type = "ParserNode"
mode = ("dom" or "json")

ParserDomNode

[nodes.example]
type = "ParserNode"
mode = ("css")
parser = ""

ExtractNode
```
[nodes.example]
type = "ParserNode"
```
ConcatNode
```
[nodes.example]
type = "ConcatNode"
```

I/O

CsvReaderNode

[nodes.example]
type = "CsvReaderNode"
file = "file_path"
column = "column_to_read"

ExcelReaderNode

[nodes.example]
type = "ExcelReaderReaderNode"
file = "file_path"
column = "column_to_read"

LineReaderNode

[nodes.example]
type = "LineReaderReaderNode"
file = "file_path"

JsonWriterNode

[nodes.example]
type = "JsonWriterReaderNode"
file = "file_path"

CsvWriterNode

[nodes.example]
type = "CsvWriterReaderNode"
file = "file_path"

quickspider
Release 0.1.2.10

Release 0.1.2.10

0.1.2.10

0.1.2.7

0.1.2.8

0.1.2.6

0.1.2.5

0.1.2.3

0.1.2.2

0.1.2.1

0.1.2

0.1.1

Documentation

QuickSpider

安装

如何使用？

TODO

当前版本的样例

说明

谁需要它？

这个东西是怎么实现的？

这个toml究竟在说啥？

有哪些内置节点呢？

Constructor

Web

Parser

I/O

Stats

Development practices

Releases

Contributors

quickspider Release 0.1.2.10

Release 0.1.2.10 Toggle Dropdown 0.1.2.10 0.1.2.7 0.1.2.8 0.1.2.6 0.1.2.5 0.1.2.3 0.1.2.2 0.1.2.1 0.1.2 0.1.1

Documentation

QuickSpider

安装

如何使用？

TODO

当前版本的样例

说明

谁需要它？

这个东西是怎么实现的？

这个toml究竟在说啥？

有哪些内置节点呢？

Constructor

Web

Parser

I/O

Stats

Development practices

Releases

Contributors

quickspider
Release 0.1.2.10

Release 0.1.2.10

0.1.2.10

0.1.2.7

0.1.2.8

0.1.2.6

0.1.2.5

0.1.2.3

0.1.2.2

0.1.2.1

0.1.2

0.1.1