pextract

Extract main textual information from HTML.


License
MIT
Install
pip install pextract==0.2

Documentation

Webpage_Textual_Extraction

an uniform webpage extraction algorithm

Requirement

Python 3.5, requests, bs4

How to use

  1. add the links you want to extract into pool.txt
  2. set the encoding you want
  3. run main.py