Some functions to parse and normalize URLs.

pip install urltools==0.4.0



Some functions to parse and normalize URLs.

NOTE and credit

this is base on the original work that used to be here but is now gone



>>> urltools.normalize("Http://")

Rules that are applied to normalize a URL:

  • tolower scheme
  • tolower host (also works with IDNs)
  • remove default port
  • remove ':' without port
  • remove DNS root label
  • unquote path, query, fragment
  • collapse path (remove '//', '/./', '/../')
  • sort query params and remove params without value


The result of parse and extract is a ParseResult named tuple that contains scheme, username, password, subdomain, domain, tld, port, path, query and fragment.

>>> urltools.parse("")
ParseResult(scheme='http', username='', password='', subdomain='', domain='example', tld='', port='', path='/foo/bar', query='x=1', fragment='abc')

If the scheme is missing parse interprets the URL as relative.

>>> urltools.parse("")
ParseResult(scheme='', username='', password='', subdomain='', domain='', tld='', port='', path='', query='', fragment='')


extract does not care about relative URLs and always tries to extract as much information as possible.

>>> urltools.extract("")
ParseResult(scheme='', username='', password='', subdomain='www', domain='example', tld='', port='', path='/abc', query='', fragment='')

Additional functions

Besides the already described main functions urltools has some more functions to manipulate segments of a URL.

  • encode (IDNA, see RFC 3490)

      >>> urltools.encode("http://mü")
  • assemble a new URL from a ParseResult

  • normalize_host

  • normalize_port

  • normalize_path

      >>> normalize_path("/a/b/../../c")
  • normalize_query

      >>> normalize_query("x=1&y=&z=3")
  • normalize_fragment

  • unquote

  • split (basically the same as urlparse.urlparse)

      >>> split("")
      SplitResult(scheme='http', netloc='', path='/abc', query='x=1&y=2', fragment='foo')
  • split_netloc

      >>> split_netloc("")
      ('foo', 'bar', '', '8080')
  • split_host

      >>> split_host("")
      ('www', 'example', '')


pip is not working yet You can install urltools from the Python Package Index (PyPI):

pip install urltools

... or get the newest version directly from GitHub:

pip install -e git://

Public Suffix List

urltools uses the Public Suffix List to split domain names correctly. E.g. the TLD of would be and not .uk.

I recommend to use a local copy of this list. Otherwise it will be downloaded after each import of urltools.

export PUBLIC_SUFFIX_LIST="/path/to/effective_tld_names.dat"

For more information see


To run the tests I use pytest:

py.test -vrxs