urlstd
urlstd
is a Python implementation of the WHATWG URL Living Standard.
This library provides URL
class, URLSearchParams
class, and low-level APIs that comply with the URL specification.
Supported APIs
-
- class urlstd.parse.
URL(url: str, base: Optional[str | URL] = None)
-
canParse: classmethod
can_parse(url: str, base: Optional[str | URL] = None) -> bool
- stringifier:
__str__() -> str
-
href:
readonly property href: str
-
origin:
readonly property origin: str
-
protocol:
property protocol: str
-
username:
property username: str
-
password:
property password: str
-
host:
property host: str
-
hostname:
property hostname: str
-
port:
property port: str
-
pathname:
property pathname: str
-
search:
property search: str
-
searchParams:
readonly property search_params: URLSearchParams
-
hash:
property hash: str
-
URL equivalence:
__eq__(other: Any) -> bool
andequals(other: URL, exclude_fragments: bool = False) → bool
-
canParse: classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLSearchParams(init: Optional[str | Sequence[Sequence[str | int | float]] | dict[str, str | int | float] | URLRecord | URLSearchParams] = None)
-
size:
__len__() -> int
-
append:
append(name: str, value: str | int | float) -> None
-
delete:
delete(name: str, value: Optional[str | int | float] = None) -> None
-
get:
get(name: str) -> str | None
-
getAll:
get_all(name: str) -> tuple[str, ...]
-
has:
has(name: str, value: Optional[str | int | float] = None) -> bool
-
set:
set(name: str, value: str | int | float) -> None
-
sort:
sort() -> None
- iterable<USVString, USVString>:
__iter__() -> Iterator[tuple[str, str]]
-
stringifier:
__str__() -> str
-
size:
- class urlstd.parse.
-
Low-level APIs
-
- urlstd.parse.
parse_url(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> URLRecord
- urlstd.parse.
-
- class urlstd.parse.
BasicURLParser
- classmethod
parse(urlstring: str, base: Optional[URLRecord] = None, encoding: str = "utf-8", url: Optional[URLRecord] = None, state_override: Optional[URLParserState] = None) -> URLRecord
- classmethod
- class urlstd.parse.
-
- class urlstd.parse.
URLRecord
-
scheme:
property scheme: str = ""
-
username:
property username: str = ""
-
password:
property password: str = ""
-
host:
property host: Optional[str | int | tuple[int, ...]] = None
-
port:
property port: Optional[int] = None
-
path:
property path: list[str] | str = []
-
query:
property query: Optional[str] = None
-
fragment:
property fragment: Optional[str] = None
-
origin:
readonly property origin: Origin | None
-
is special:
is_special() -> bool
-
is not special:
is_not_special() -> bool
-
includes credentials:
includes_credentials() -> bool
-
has an opaque path:
has_opaque_path() -> bool
-
cannot have a username/password/port:
cannot_have_username_password_port() -> bool
-
URL serializer:
serialize_url(exclude_fragment: bool = False) -> str
-
host serializer:
serialize_host() -> str
-
URL path serializer:
serialize_path() -> str
-
URL equivalence:
__eq__(other: Any) -> bool
andequals(other: URLRecord, exclude_fragments: bool = False) → bool
-
scheme:
- class urlstd.parse.
-
Hosts (domains and IP addresses)
- class urlstd.parse.
IDNA
-
domain to ASCII: classmethod
domain_to_ascii(domain: str, be_strict: bool = False) -> str
-
domain to Unicode: classmethod
domain_to_unicode(domain: str, be_strict: bool = False) -> str
-
domain to ASCII: classmethod
- class urlstd.parse.
Host
-
host parser: classmethod
parse(host: str, is_not_special: bool = False) -> str | int | tuple[int, ...]
-
host serializer: classmethod
serialize(host: str | int | Sequence[int]) -> str
-
host parser: classmethod
- class urlstd.parse.
-
- urlstd.parse.
string_percent_decode(s: str) -> bytes
- urlstd.parse.
-
- urlstd.parse.
string_percent_encode(s: str, safe: str, encoding: str = "utf-8", space_as_plus: bool = False) -> str
- urlstd.parse.
-
application/x-www-form-urlencoded parser
- urlstd.parse.
parse_qsl(query: bytes) -> list[tuple[str, str]]
- urlstd.parse.
-
application/x-www-form-urlencoded serializer
- urlstd.parse.
urlencode(query: Sequence[tuple[str, str]], encoding: str = "utf-8") -> str
- urlstd.parse.
-
Validation
- class urlstd.parse.
HostValidator
-
valid host string: classmethod
is_valid(host: str) -> bool
-
valid domain string: classmethod
is_valid_domain(domain: str) -> bool
-
valid IPv4-address string: classmethod
is_valid_ipv4_address(address: str) -> bool
-
valid IPv6-address string: classmethod
is_valid_ipv6_address(address: str) -> bool
-
valid host string: classmethod
- class urlstd.parse.
URLValidator
-
valid URL string: classmethod
is_valid(urlstring: str, base: Optional[str | URLRecord] = None, encoding: str = "utf-8") -> bool
- valid URL-scheme string: classmethod
is_valid_url_scheme(value: str) -> bool
-
valid URL string: classmethod
- class urlstd.parse.
-
-
Compatibility with standard library
urllib
-
urlstd.parse.
urlparse(urlstring: str, base: str = None, encoding: str = "utf-8", allow_fragments: bool = True) -> urllib.parse.ParseResult
urlstd.parse.urlparse()
ia an alternative tourllib.parse.urlparse()
. Parses a string representation of a URL using the basic URL parser, and returnsurllib.parse.ParseResult
.
-
Basic Usage
To parse a string into a URL
:
from urlstd.parse import URL
URL('http://user:pass@foo:21/bar;par?b#c')
# → <URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>
To parse a string into a URL
with using a base URL:
url = URL('?ffi&🌈', base='http://example.org')
url # → <URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
url.search # → '?%EF%AC%83&%F0%9F%8C%88'
params = url.search_params
params # → URLSearchParams([('ffi', ''), ('🌈', '')])
params.sort()
params # → URLSearchParams([('🌈', ''), ('ffi', '')])
url.search # → '?%F0%9F%8C%88=&%EF%AC%83='
str(url) # → 'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='
To validate a URL string:
from urlstd.parse import URL, URLValidator, ValidityState
URL.can_parse('https://user:password@example.org/') # → True
URLValidator.is_valid('https://user:password@example.org/') # → False
validity = ValidityState()
URLValidator.is_valid('https://user:password@example.org/', validity=validity)
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"
URL.can_parse('file:///C|/demo') # → True
URLValidator.is_valid('file:///C|/demo') # → False
validity = ValidityState()
URLValidator.is_valid('file:///C|/demo', validity=validity) # → False
validity.valid # → False
validity.validation_errors # → 1
validity.descriptions[0] # → "invalid-URL-unit: code point is found that is not a URL unit: U+007C (|) in 'file:///C|/demo' at position 9"
To parse a string into a urllib.parse.ParseResult
with using a base URL:
import html
from urllib.parse import unquote
from urlstd.parse import urlparse
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
unquote(pr.query) # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
unquote(pr.query, encoding='windows-1251') # → 'aÿb'
html.unescape('aÿb') # → 'aÿb'
pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
pr # → ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
unquote(pr.query, encoding='windows-1252') # → 'aÿb'
Logging
urlstd
uses standard library logging for validation error.
Change the logger log level of urlstd
if needed:
logging.getLogger('urlstd').setLevel(logging.ERROR)
Dependencies
-
icupy >= 0.11.0 (pre-built packages are available)
-
icupy
requirements:- ICU4C (ICU - International Components for Unicode) - latest version recommended
- C++17 compatible compiler (see supported compilers)
- CMake >= 3.7
-
Installation
-
Configuring environment variables for icupy (ICU):
-
Windows:
-
Set the
ICU_ROOT
environment variable to the root of the ICU installation (default isC:\icu
). For example, if the ICU is located inC:\icu4c
:set ICU_ROOT=C:\icu4c
or in PowerShell:
$env:ICU_ROOT = "C:\icu4c"
-
To verify settings using icuinfo (64 bit):
%ICU_ROOT%\bin64\icuinfo
or in PowerShell:
& $env:ICU_ROOT\bin64\icuinfo
-
-
Linux/POSIX:
-
If the ICU is located in a non-regular place, set the
PKG_CONFIG_PATH
andLD_LIBRARY_PATH
environment variables. For example, if the ICU is located in/usr/local
:export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-
To verify settings using pkg-config:
$ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
-
-
-
Installing from PyPI:
pip install urlstd
Running Tests
Install dependencies:
pipx install tox
# or
pip install --user tox
To run tests and generate a report:
git clone https://github.com/miute/urlstd.git
cd urlstd
tox -e wpt
See result: tests/wpt/report.html