onwebchange
- Default Console Web UI.
- RSS support.
- Release on pypi.
- Add tag filter, to distinguish all the RSS sites. Add multiple tags support.
- Add .pyz usage for fast deploying.
- Support one-key sub RSS
Install
> pip3 install onwebchange -U
> python3 -m onwebchange -f wc.config -i 300 --host=127.0.0.1 -p 8080 --username=admin --password=admin
or shiv as one file "onwebchange.pyz", for fast deploying
> pip3 install shiv -U
> shiv -o onwebchange.pyz -e onwebchange.__main__:main --compressed onwebchange
> python3.7 onwebchange.pyz --username=admin --password=admin
Requirements
torequests click bottle objectpath beautifulsoup4
Quick start
- install
python3 -m onwebchange
- add shell command to systemd / supervisor.
- Run with username & password.
python3 -m onwebchange -f wc.config -i 300 --host=127.0.0.1 -p 8080 --username=admin --password=admin
- Add Tasks
-
Press [AddTask] button
-
Fill the blank:
name: "pypi trending projects no1"
request_args: "https://pypi.org/"
parser_name: "css"
operation: "#content > div:nth-child(4) > div > div:nth-child(1) > ul > li:nth-child(1) > a > h3 > span.package-snippet__name"
value: "$text"
check_interval: 300
max_change: 10
-
Press [Update Task] button
-
Subscribe RSS from chrome RSS reader extension
-
Default Web UI
Example
run as main package with command
python3 -m onwebchange -f wc.config -i 300 -a
or
from onwebchange.core import WebHandler
from onwebchange.webui import app
if __name__ == "__main__":
wh = WebHandler(
app,
file_path=None,
loop_interval=300,
auto_open_browser=True,
change_callback=lambda task: print(task.name),
app_kwargs={'port': 9988})
# python3 -m onwebchange -f wc.config -i 300 -a
wh.run()
Parser examples
-
regex
- parser_name: re
- operation: class="(.*?)"
- value: $1
-
css selector for attribute
- parser_name: css
- operation: #J_all_item_910789
- value: @class
-
value also can be:
- $string
- list of outer HTML
- $text
- list of node.text
- $get_text
- list of node.get_text()
- $string
-
value also can be:
-
json (ObjectPath).
-
with json-handle chrome extention.
-
parser_name: json
-
operation: $.headers["Accept-Encoding"]
-
value: $text
-
-
python
-
parser_name: python
-
operation:
-
def parse(resp): return resp.text[:10]
-
-
value as null
-
New Task template
"name": "task name0",
"request_args": "https://pypi.org", # could be url, curl string, request args dict.
"parser_name": "css", # could be re/css/json/python
"operation": ".lede-paragraph",
"value": "$text",
"check_interval": 300,
"max_change": 2,
"sorting_list": true,
"origin_url": "https://pypi.org",
"encoding": null
{
"name": "task name1",
"request_args": "https://pypi.org",
"parser_name": "re",
"operation": "class=\"(lede-paragraph)\"",
"value": "$1",
"check_interval": 300,
"max_change": 2,
"sorting_list": true,
"origin_url": "",
"encoding": null
}
{
"name": "task name2",
"request_args": "http://httpbin.org/get",
"parser_name": "json",
"operation": "$.url",
"value": "",
"check_interval": 300,
"sorting_list": true,
"origin_url": "",
"encoding": null
}
More docs
Watchdog task.
:param name: Task name.
:type name: str
:param request_args: arg for sending a request, could be url/curl_string/dict.
:type request_args: dict / str
:param parser_name: re, css, json, python, defaults to None, use the resp.text.
:type parser_name: str, optional
:param operation: parse operation for the parser_name, defaults to None
:type operation: str, optional
:param value: value operation for the parser, defaults to None
:type value: str, optional
:param sorting_list: whether sorting the list of result from `css or other parsers`, defaults to True
:type sorting_list: bool, optional
:param check_interval: check_interval, defaults to 60 seconds
:type check_interval: int, optional
:param max_change: save result in check_result_list, save the latest 2 change, defaults to 2
:type max_change: list, optional
:param check_result_list: latest `max_change` checking result, usually use md5 to shorten it, defaults to None
:type check_result_list: list, optional
:param origin_url: the url to see the changement, defaults to request_args['url']
:type origin_url: str, optional
request_args examples:
url:
http://pypi.org
args:
{'url': 'http://pypi.org', 'method': 'get'}
curl:
curl 'https://pypi.org/' -H 'authority: pypi.org' -H 'cache-control: max-age=0' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36' -H 'sec-fetch-mode: navigate' -H 'sec-fetch-user: ?1' -H 'dnt: 1' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'sec-fetch-site: none' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: zh-CN,zh;q=0.9' -H 'cookie: user_id__insecure=; session_id=' --compressed
parser examples:
re:
operation = '.*?abc'
value = '$0' (or '$1', `$` means the group index for regex result)
css:
operation = ".className"
value = '$string'
$string: return [node] as outer html
$text: return [node.text]
$get_text: return [node.get_text()]
@attr: [get attribute from node]
json:
view more: https://github.com/adriank/ObjectPath
# input response JSON string: {"a": 1}
operation = "$.a"
value = None
python:
! function name should always be `parse` if value is None,
or use `value` as the function name.
`operation can be a function object.`
operation = lambda resp: resp.text
operation = r'''
def parse(resp):
return md5(resp.text)
'''
value = None