ushlex

Replacement for shlex (that works with unicode) for Python 2.X.


License
MIT
Install
pip install ushlex==0.98

Documentation

Inspired by ordereddict, this is a packaging of an improved shlex module for Python 2 that handles Unicode properly.

Shlex is "A lexical analyzer class for simple shell-like syntaxes."

If you've found your way here, you probably already know that the standard shlex doesn't handle Unicode prior to Python 3 (see bug 1170 for details). Since Python 2.7.3 however, it accepts unicode objects. Sadly, it still does not handle non-ascii chars:

>>> import sys, shlex
>>> sys.version
'2.7.5+ ...'
>>> shlex.split(u'Hello world')
['Hello', 'world']

>>> shlex.split(u'café')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/lib/python2.7/shlex.py", line 275, in split
    lex = shlex(s, posix=posix)
  File "/usr/lib/python2.7/shlex.py", line 25, in __init__
    instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
                     position 3: ordinal not in range(128)

This module does handle unicode objects and byte strings under Python 2.x:

>>> import ushlex as shlex
>>> shlex.split(u'café')
[u'caf\xe9']

>>> shlex.split(u'echo "☺ ☕ ♫"')
[u'echo', u'\u263a \u2615 \u266b']

>>> from ushlex import split as shplit
>>> shplit('echo "hello there"')
['echo', 'hello there']

I found these release notes inside:

# Module and documentation by Eric S. Raymond, 21 Dec 1998
# Input stacking and error message cleanup added by ESR, March 2000
# push_source() and pop_source() made explicit by ESR, January 2001.
# Posix compliance, split(), string arguments, and
# iterator interface by Gustavo Niemeyer, April 2003.
# Modified to support Unicode by Colin Walters, Dec 2007

Bugs

Packaging-only bugs may be submitted to bitbucket. Do not enter bugs for ushlex itself, as the packager is not the author.