yelp-bytes

Utilities for dealing with byte strings, invented and maintained by Yelp.


License
Unlicense
Install
pip install yelp-bytes==0.4.4

Documentation

yelp_bytes

Build Status Coverage Status

yelp_bytes contains several utility functions to help ensure that the data you're using is always either Unicode or byte strings, taking care of the edge cases for you so that you don't have to worry about them. We handle ambiguous bytestrings by leveraging our our "internet" encoding. This allows you to write functions that need unicode but can accept arbitrary values without crashing.

Installation

For a primer on pip and virtualenv, see the Python Packaging User Guide.

TL;DR: pip install yelp_bytes

Usage

The from_bytes function is the most interesting one. It takes an object and returns its unicode representation. This function never fails, except for extremely rare edge cases (that we haven't ourselves encountered). from_utf8 is similar, but uses 'UTF-8' rather than 'internet' encoding, and so will fail if given poorly-encoded bytes. to_bytes and to_utf8 both take an object and return its UTF-8 bytestring representation.

python
>>> import yelp_bytes

>>> euro = u'€'

>>> print(yelp_bytes.from_bytes(euro.encode('UTF-8')))
€
>>> print(yelp_bytes.from_bytes(euro.encode('cp1252')))
€
>>> print(yelp_bytes.from_bytes(euro))
€

We also handle objects with (certain common classes of) encoding issues, and all the other various edge cases we've encountered. One of the more common is putting non-ascii unicode into an error message:

python
>>> error = Exception(euro)
>>> print(error)  # doctest: +SKIP
Traceback (most recent call last):
    ...
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

>>> print(yelp_bytes.from_utf8(error))
€
>>> yelp_bytes.to_utf8(error) == euro.encode('UTF-8')
True

Check out the source to learn more about the input parameters and return values.