Replacement for shlex (that works with unicode) for Python 2.X.
pip install ushlex==0.98
Inspired by ordereddict, this is a packaging of an improved shlex module for Python 2 that handles Unicode properly.
Shlex is "A lexical analyzer class for simple shell-like syntaxes."
If you've found your way here, you probably already know that the standard shlex doesn't handle Unicode prior to Python 3 (see bug 1170 for details). Since Python 2.7.3 however, it accepts unicode objects. Sadly, it still does not handle non-ascii chars:
>>> import sys, shlex
>>> sys.version
'2.7.5+ ...'
>>> shlex.split(u'Hello world')
['Hello', 'world']
>>> shlex.split(u'café')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/lib/python2.7/shlex.py", line 275, in split
lex = shlex(s, posix=posix)
File "/usr/lib/python2.7/shlex.py", line 25, in __init__
instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 3: ordinal not in range(128)
This module does handle unicode objects and byte strings under Python 2.x:
>>> import ushlex as shlex
>>> shlex.split(u'café')
[u'caf\xe9']
>>> shlex.split(u'echo "☺ ☕ ♫"')
[u'echo', u'\u263a \u2615 \u266b']
>>> from ushlex import split as shplit
>>> shplit('echo "hello there"')
['echo', 'hello there']
I found these release notes inside:
# Module and documentation by Eric S. Raymond, 21 Dec 1998 # Input stacking and error message cleanup added by ESR, March 2000 # push_source() and pop_source() made explicit by ESR, January 2001. # Posix compliance, split(), string arguments, and # iterator interface by Gustavo Niemeyer, April 2003. # Modified to support Unicode by Colin Walters, Dec 2007
Packaging-only bugs may be submitted to bitbucket. Do not enter bugs for ushlex itself, as the packager is not the author.