pollywog

pollywog


License
Other
Install
pip install pollywog==0.1.1

Documentation

Syntactic sugar for working with regular expressions in Python. Based on a blog post.

Usage

In the following examples we will use these regular expressions to capture URLs:

from pollywog import R

simple_url_re = '(https?://)([^/]+)(/[\S]*)?'
url_re = '(?P<scheme>https?://)(?P<host>[^/]+)(?P<path>/[\S]*)?'

Checking if a match exists

url = raw_input('Enter a URL: ')
if R/simple_url_re/url:
    print 'You entered a valid URL'
else:
    print 'That URL appears to be invalid.'

Extracting data from a match

The rshift >> operator will populate a dictionary or list with search results:

# Store search results in the `results` dictionary.
result = {}
R/url_re/'http://charlesleifer.com/blog/'>>result
print result
# {'scheme': 'http://', 'host': 'charlesleifer.com', 'path': '/blog/'}

# Store search results in the `url_parts` list.
url_parts = []
R/url_re/'https://github.com/coleifer/'>>url_parts
print url_parts
# ['https://', 'github.com', '/coleifer/']

For less magic, you can also use the search() method. By default, the search() method will return a tuple.

url = raw_input('Enter a URL: ')
result = (R/simple_url_re/url).search()
if result:
    scheme, host, path = result
    print 'Scheme:', scheme
    print 'Host:', host
    print 'Path:', path

By using named parameters, the search() method can also return a dict.

url = raw_input('Enter a URL: ')
result = (R/url_re/url).search(as_dict=True)
if result:
    print 'Scheme:', result['scheme']
    print 'Host:', result['host']
    print 'Path:', result['path']

Iterating over matches

The default iterator will return tuples:

sample = """
    This is a test. Visit http://charlesleifer.com/ for more examples.
    Also check out my GitHub at https://github.com/coleifer/
"""
for scheme, host, path in R/url_re/sample:
    print host + path

Though it is also possible to iterate over dictionaries:

sample = """
    This is a test. Visit http://charlesleifer.com/ for more examples.
    Also check out my GitHub at https://github.com/coleifer/
"""
result = R/url_re/sample
for url_dict in result.iter_dicts():
    print url_dict['host'] + url_dict['path']

Search and Replace

To perform a replacement, just tack on another slash followed by the replacement expression:

print R/'(person)'/'hello person!'/'charlie'
# Prints: "hello charlie!"

print R/'(person)'/'I love you person!'/'baby huey'
# Prints: "I love you baby huey!"

Another example using references to capture-groups:

# US phone number with area code, e.g. (555) 123-4567
phone_re = '\((\d{3})\)[-\s](\d{3})-(\d{4})'

# Normalize phone number to use dots.
replacement = r'\1.\2.\3'

print R/phone_re/'(555) 123-4567'/replacement
# Prints: 555.123.4567

Splitting strings

To split strings, use the subtraction operator:

# Split on whitespace and non-alphanumeric.
rgx = '[\s\W]+'

print R/rgx-'hey! "testing 123"'
['hey', 'testing', '123', '']

For slightly less magic, you can also use the split() method:

rgx = '[\s\W]+'

print (R/rgx/'hey! "testing 123"').split()
['hey', 'testing', '123', '']