convert2txt

convert html,PDF,DOC file to txt


Keywords
html, parser, python2, spider
License
MIT
Install
pip install convert2txt==0.1.1

Documentation

# tidy_page It is a html parser.Given a html document,It can get the content from the document. 给定一个网页提取网页中的正文内容和标题,用于网页解析、内容提取