html-jparser
Library for parsing html with JQuery element selection. Very easy to use.
Installing
Library on PyPi: https://pypi.org/project/html-jparser/0.3/
pip install html-jparser
Using
First you need to initialize the class object with url or html_s keyword argument.
from html_jparser.core import HtmlParser
p = HtmlParser(url='https://easypassword.ru/')
# Or for example read html string from file
p = HtmlParser(html_s=open('index.html', 'r', encoding='utf-8').read())
The html tag tree starts with root. Root is an abstract tag containing all the tags of an html document. For example, for normal html, the root child would be the html tag. The children attribute of each tag contains a list of children tags.
print(p.root)
# root: [html]
print(p.root.children)
# [html]
print(p.root.children[0])
# html: [head, body]
Each tag contains a select method that takes a jQuery select string and returns a list of found tags among child. Select on root tag equally select on parser object.
header = p.root.children[0].children[1].select('h1.center-align')
# Or
header = p.root.select('body h1.center-align')
# equally
header = p.select('body h1.center-align')
Each tag contains attrs (attributes dictionary), comments (list of string), text (string), parent (HtmlTag obj), tag (name), children (list of HtmlTag objects).