pyce3

Multilingual Web Page Content Extractor


License
Apache-2.0
Install
pip install pyce3==1.0.0

Documentation

pyce3: Multilingual Web Page Content Extractor for Python3

Introduction

pyce3 is a python3 package for multilingual web page content extraction. It is used to extract the content of article type web pages, such as news, blog posts, etc.

Usage

import pyce3
import requests

url = "http://caijing.chinadaily.com.cn/a/201911/21/WS5dd62455a31099ab995ed438.html"
html = requests.get(url).content
encoding, time, title, text, next_link = pyce3.parse(url, html)
print("编码:"+encoding)
print('='*10)
print("标题:"+title)
print("时间:"+time)
print('='*10)
print("内容:"+text)
print("NextPageLink: ", next_link)