Base64 encoded uuid v4 slugs


License
MPL-2.0
Install
pip install slugid==2.0.0

Documentation

slugid.py - Compressed UUIDs for python

https://tools.taskcluster.net/lib/assets/taskcluster-120.png

Build Status Coverage Status License pypi Version Downloads

A python 2.7 and python 3.5 compatible module for generating v4 UUIDs and encoding them into 22 character URL-safe base64 slug representation (see RFC 4648 sec. 5).

Slugs are url-safe base64 encoded v4 uuids, stripped of base64 = padding.

There are two methods for generating slugs - slugid.v4() and slugid.nice().

  • The slugid.v4() method returns a slug from a randomly generated v4 uuid.
  • The slugid.nice() method returns a v4 slug which conforms to a set of "nice" properties. At the moment the only "nice" property is that the slug starts with [A-Za-f], which in turn implies that the first (most significant) bit of its associated uuid is set to 0.

The purpose of the slugid.nice() method is to support having slugids which can be used in more contexts safely. Regular slugids can safely be used in urls, and for example in AMQP routing keys. However, slugs beginning with - may cause problems when used as command line parameters.

In contrast, slugids generated by the slugid.nice() method can safely be used as command line parameters. This comes at a cost to entropy (121 bits vs 122 bits for regular v4 slugs).

Slug consumers should consider carefully which of these two slug generation methods to call. Is it more important to have maximum entropy, or to have slugids that do not need special treatment when used as command line parameters? This is especially important if you are providing a service which supplies slugs to unexpecting tool developers downstream, who may not realise the risks of using your regular v4 slugs as command line parameters, especially since this would arise only as an intermittent issue (one time in 64).

Generated slugs take the form [A-Za-z0-9_-]{22}, or more precisely:

  • slugid.v4() slugs conform to [A-Za-z0-9_-]{8}[Q-T][A-Za-z0-9_-][CGKOSWaeimquy26-][A-Za-z0-9_-]{10}[AQgw]
  • slugid.nice() slugs conform to [A-Za-f][A-Za-z0-9_-]{7}[Q-T][A-Za-z0-9_-][CGKOSWaeimquy26-][A-Za-z0-9_-]{10}[AQgw]

Slugs are generated with the interpreter's default string type. On Python 2, these are byte strings. On Python 3, these are unicode strings.

RFC 4122 defines the setting of six bits of the v4 UUID which implies v4 slugs provide 128 - 6 = 122 bits entropy. Due to the (un)setting of the first bit of "nice" slugs, nice slugs provide therefore 121 bits entropy.

These are the six fixed bits:

  • bit 48: 0
  • bit 49: 1
  • bit 50: 0
  • bit 51: 0
  • bit 64: 1
  • bit 65: 0

Splitting the 128 bits into groups of six to see the base64 character boundaries, we get:

position:                                                                                                                 11 111111 111111 111111 111111 11
                 11 111111 112222 222222 333333 333344 444444 445555 555555 666666 666677 777777 778888 888888 999999 999900 000000 001111 111111 222222 22
      012345 678901 234567 890123 456789 012345 678901 234567 890123 456789 012345 678901 234567 890123 456789 012345 678901 234567 890123 456789 012345 67
bin: |......|......|......|......|......|......|......|......|0100..|......|....10|......|......|......|......|......|......|......|......|......|......|..0000|
b64: |   α  |   α  |   α  |   α  |   α  |   α  |   α  |   α  |   β  |   α  |   γ  |   α  |   α  |   α  |   α  |   α  |   α  |   α  |   α  |   α  |   α  |   δ  |

Using the base64url encoding scheme, we can see which characters are allowed at each of the 22 positions.

  • α = 0b...... ∈ {

    000000 A
    000001 B
    000010 C
    000011 D
    000100 E
    000101 F
    000110 G
    000111 H
    001000 I
    001001 J
    001010 K
    001011 L
    001100 M
    001101 N
    001110 O
    001111 P
    010000 Q
    010001 R
    010010 S
    010011 T
    010100 U
    010101 V
    010110 W
    010111 X
    011000 Y
    011001 Z
    011010 a
    011011 b
    011100 c
    011101 d
    011110 e
    011111 f
    100000 g
    100001 h
    100010 i
    100011 j
    100100 k
    100101 l
    100110 m
    100111 n
    101000 o
    101001 p
    101010 q
    101011 r
    101100 s
    101101 t
    101110 u
    101111 v
    110000 w
    110001 x
    110010 y
    110011 z
    110100 0
    110101 1
    110110 2
    110111 3
    111000 4
    111001 5
    111010 6
    111011 7
    111100 8
    111101 9
    111110 -
    111111 _
    

    }

  • β = 0b0100.. ∈ {

    010000 Q
    010001 R
    010010 S
    010011 T
    

    }

  • γ = 0b....10 ∈ {

    000010 C
    000110 G
    001010 K
    001110 O
    010010 S
    010110 W
    011010 a
    011110 e
    100010 i
    100110 m
    101010 q
    101110 u
    110010 y
    110110 2
    111010 6
    111110 -
    

    }

  • δ = 0b..0000 ∈ {

    000000 A
    010000 Q
    100000 g
    110000 w
    

    }

Thus we reach a 22 character encoding of:

  • α{8}βαγα{10}δ

which denormalised becomes:

  • ^[A-Za-z0-9_-]{8}[Q-T][A-Za-z0-9_-][CGKOSWaeimquy26-][A-Za-z0-9_-]{10}[AQgw]$

Usage

import slugid

# Generate "nice" URL-safe base64 encoded UUID version 4 (random)
slug = slugid.nice()  # a8_YezW8T7e1jLxG7evy-A

# Alternative, if slugs will not be used as command line parameters
slug = slugid.v4()    # -9OpXaCORAaFh4sJRk7PUA

# Get python uuid.UUID object
uuid = slugid.decode(slug)

# Compress to slug again
assert(slug == slugid.encode(uuid))

RNG Characteristics

UUID generation is performed by the built-in python uuid library which does not document its randomness, but falls back to system uuid-generation libraries where available, then urandom, then random. Therefore generated slugids match these rng characteristics.

License

The slugid library is released on the MPL 2.0 license, see the LICENSE for complete license.

Testing

pip install -r requirements.txt
tox

Publishing

To republish this library to pypi.python.org, update the version number in slugid/__init__.py, commit it, push to github, and then run:

pip install -U twine setuptools wheel

# delete stale versions
rm -rf dist/ build/

# build source package and wheel
python setup.py sdist bdist_wheel

# publish it
twine upload -s dist/*