regxon

RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.


License
MIT
Install
pip install regxon==0.0.9

Documentation

RegXon

RegXon is a powerful validator, sanitizer and content parser that you're searching for decades.

Installation

pip install regxon

Usage

from regxon.common import Regxon
regxon = Regxon()

General Validation

General validation includes email, domain, url and ipv4.

Validate Email

from regxon.common import Regxon

regxon = Regxon()
regxon.is_email('xyz@.com')  # None
regxon.is_email('xyz@cpx.com')  # returns a proper Match object; you can grab the match with `.string`

Validate Domain

from regxon.common import Regxon

regxon = Regxon()
regxon.is_domain('xyzcom')  # None
regxon.is_domain('xyz.com')  # returns a proper Match object; you can grab the match with `.string`

Validate URL

from regxon.common import Regxon

regxon = Regxon()
regxon.is_url('xyz.com')  # None
regxon.is_url('https://xyz.com')  # returns a proper Match object

Validate HTTP URL

from regxon.common import Regxon

regxon = Regxon()
regxon.is_http_url('xyz.com')  # None; returns None if the url is not http
regxon.is_http_url('ftp://xyz.com')  # None; returns None if the url is not http
regxon.is_http_url('http://django.c') # None; returns None because `.c` is not a valid domain 
regxon.is_http_url('https://xyz.com')  # returns a proper Match object; you can grab the match with `.string`

Validate IP

from regxon.common import Regxon

regxon = Regxon()

# 1, 2 both are same and return a proper Match, as default schema is ""
regxon.is_ipv4('127.0.0.1')                 # 1
regxon.is_ipv4('127.0.0.1', schema='')      # 2; matches because 127.0.0.1 has no schema

regxon.is_ipv4('http://127.0.0.1')  # returns None as schema is not matched; "http" != ""
regxon.is_ipv4('http://127.0.0.1', schema='')  # returns None as schema is not matched; "http" != ""

regxon.is_ipv4('http://127.0.0.1', schema='http')  # returns a proper Match
regxon.is_ipv4('https://127.0.0.1', schema='http')  # returns None as schema is not matched; "https" != "http"

Validate Phone Number

from regxon.common import Regxon

regxon = Regxon()
regxon.is_phone('+91 1234567890')  # returns a proper Match object; you can grab the match with `.string`

HTML Sanitization and Validation

RegXon provides a powerful HTML sanitizer and validator that you're searching for decades. It's a combination of html5lib and beautifulsoup4.

You "how to remove an attribute from HTML tag" problem is solved now. Or another problem of "how to remove a tag from HTML" is also solved.

from regxon.html import RegxonHTML

regxon_html = RegxonHTML()
html_content = """
<img onload="alert(1)" onerror="hey" src="http://example.com" />
<script>alert(1)</script>
"""
html = regxon_html.get_sanitized_content(html_content)

print(html)

The above code will print the following output

<img onerror="hey"/>

Add custom excluded attributes for any tag you want

from regxon.html import RegxonHTML

regxon_html = RegxonHTML()
html_content = """
<img onload="alert(1)" onerror="hey" src="http://example.com" />
<script>alert(1)</script>
"""

# Add custom excluded attributes for any tag you want
regxon_html.excluded_attributes.update({
    'img': regxon_html.excluded_attributes['img'] + ['onerror'],
})

The above code will print the following output

<img/>

Purpose of RegXon

  • Sanitize HTML; remove unwanted tags and attributes; XSS prevention
  • Validate IP, URL, Domain; SSRF prevention
  • Validate Email; Email spoofing prevention
  • Validate Phone Number; Phone number spoofing prevention

License

MIT

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Authors

Acknowledgements