A tool designed to quickly parse html tags and elements.
- Pip packages:
- timeoutcall==1.*
- beautifulsoup4==4.*
- html5lib==1.*
- requests==2.*
Firstly, if you would like to view a page title alone, you could use the titleFromUri
function:
from htmltagparse import titleFromUri
websiteTitle = titleFromUri("https://github.com/")
print(websiteTitle) # output: GitHub: Let’s build from here · GitHub
from htmltagparse import build
brave = build.fromUri("https://search.brave.com/", timeout=20)
print(brave.tags) #list of tags found on the specified page
print(brave.searchTag("footer")) #displays a list of innerHtml content to the footer tags
print(brave.searchTag("footer", htmlFormat=False)[0]) #output: © Brave Software Brave Search API Summarizer Helpful answers Report a security issue
from htmltagparse import HtmlPage
from requests import get
htmlContent = get("https://duckduckgo.com/").text
ddg = HtmlPage(htmlContent)
print(list(ddg.sources)) #output: ['script']
With this package, you have the ability to search the html page you have created directly through a function:
from htmltagparse import build
import re
videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
#NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
#get a list of tags to the youtube video via this regex pattern
videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
#converting from string to array
videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
videoTags = "no tags found"
print(videoTags)
Another way you could get tags from a Youtube video:
import htmltagparse
#youtube video id
videoId = ""
video = htmltagparse.build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
for i in video.metadata:
if i.get("name") == "keywords":
tags = i.get("content").split(", ")
break
print(i)
- cd into root directory of this repository
- run
python3 -m build
Note
Errors building this package may be due to this packages requirements, if this occurs, use python3 -m build -n
instead.
Must not include:
- Major changes
- Breaking code
- Changes to version number