The urlib.parse module in Python does not follow the legacy RFC 3978 standard nor does it follow the newer WHATWG URL specification. It is also relatively slow.
This is ada_url
, a fast standard-compliant Python library for working with URLs based on the Ada
URL parser.
Install from PyPI:
The URL
class is intended to match the one described in the WHATWG URL spec:.
The parse_url
function returns a dictionary of all URL elements:
>>> from ada_url import parse_url
>>> parse_url('https://user:pass@example.org:80/api?q=1#2')
{
'href': 'https://user:pass@example.org:80/api?q=1#2',
'username': 'user',
'password': 'pass',
'protocol': 'https:',
'port': '80',
'hostname': 'example.org',
'host': 'example.org:80',
'pathname': '/api',
'search': '?q=1',
'hash': '#2',
'origin': 'https://example.org:80',
'host_type': <HostType.DEFAULT: 0>,
'scheme_type': <SchemeType.HTTPS: 2>
}
Replacing URL components with the URL
class:
Replacing URL components with the replace_url
function:
>>> from ada_url import replace_url >>> replace_url('https://example.org/path/../file.txt', host='example.com') 'https://example.com/file.txt'
The URLSearchParams
class is intended to match the one described in the WHATWG URL spec.
The parse_search_params
function returns a dictionary of search keys mapped to value lists:
The idna
class can encode and decode IDNs:
This library is compliant with the WHATWG URL spec. This means, among other things, that it properly encodes IDNs and resolves paths:
Contrast that with the Python standard library's urlib.parse
module:
This package uses CFFI to call the Ada
library's functions, which has a performance cost. The alternative can_ada (Canadian Ada) package uses pybind11 to generate a Python extension module, which is more performant.