htmltagparse

A tool designed to quickly parse html tags and elements.

Prerequisites

Pip packages:
- timeoutcall==1.*
- beautifulsoup4==4.*
- html5lib==1.*
- requests==2.*

Usage

Reading Page Titles

Firstly, if you would like to view a page title alone, you could use the titleFromUri function:

from htmltagparse import titleFromUri

websiteTitle = titleFromUri("https://github.com/")
print(websiteTitle) # output: GitHub: Let’s build from here · GitHub

Building Pages

Building Pages via URI

from htmltagparse import build

brave = build.fromUri("https://search.brave.com/", timeout=20)
print(brave.tags) #list of tags found on the specified page
print(brave.searchTag("footer")) #displays a list of innerHtml content to the footer tags
print(brave.searchTag("footer", htmlFormat=False)[0]) #output: © Brave Software Brave Search API Summarizer Helpful answers Report a security issue

Building Pages via HTML

from htmltagparse import HtmlPage
from requests import get

htmlContent = get("https://duckduckgo.com/").text
ddg = HtmlPage(htmlContent)
print(list(ddg.sources)) #output: ['script']

Searching A Page

With this package, you have the ability to search the html page you have created directly through a function:

from htmltagparse import build
import re

videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
  #NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
  #get a list of tags to the youtube video via this regex pattern
  videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
  #converting from string to array
  videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
  videoTags = "no tags found"

print(videoTags)

Another way you could get tags from a Youtube video:

import htmltagparse

#youtube video id
videoId = ""
video = htmltagparse.build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)

for i in video.metadata:
  if i.get("name") == "keywords":
    tags = i.get("content").split(", ")
    break
print(i)

Developers

Building to Wheel File

cd into root directory of this repository
run python3 -m build

Note

Errors building this package may be due to this packages requirements, if this occurs, use python3 -m build -n instead.

Contributions

Must not include:

Major changes
Breaking code
Changes to version number

htmltagparse
Release 3.0

Release 3.0

3.0

2.0

1.0

Documentation

htmltagparse

Prerequisites

Usage

Reading Page Titles

Building Pages

Building Pages via URI

Building Pages via HTML

Searching A Page

Developers

Building to Wheel File

Contributions

Stats

Development practices

Releases

Contributors

htmltagparse Release 3.0

Release 3.0 Toggle Dropdown 3.0 2.0 1.0

Documentation

htmltagparse

Prerequisites

Usage

Reading Page Titles

Building Pages

Building Pages via URI

Building Pages via HTML

Searching A Page

Developers

Building to Wheel File

Contributions

Stats

Development practices

Releases

Contributors

htmltagparse
Release 3.0

Release 3.0

3.0

2.0

1.0