tregex-tobiasli

Wrapper for more functionality out of regex parse results.


Keywords
parsing, python3, regex, regular-expressions, text-processing
License
MIT
Install
pip install tregex-tobiasli==1.0

Documentation

tregex

Build Status
Coverage Status
PyPI version

tregex is a wrapper around Python regular expressions that makes usage smoother and more user friendly.

Install

pip install tregex-tobiasli

Usage

import tregex as tr
emails = 'john.smith@somewhere.co.uk; hackzor@coolface.com'
pattern = '([^;]+?)@(.+?)\.([^;]+)'
t = tr.to_tuple(pattern=pattern, string=emails)
assert t[0][1] == 'somewhere'
assert t[1][2] == 'com'

named_pattern = '(?P<name>[^;]+?)@(?P<address>.+?)\.(?P<domain>[^;]+)'
t = tr.to_dict(pattern=named_pattern, string=emails)
assert t[0]['name'] == 'john.smith'
assert t[1]['address'] == 'coolface'

t = tr.to_object(pattern=named_pattern, string=emails)
assert t[0].name == 'john.smith'
assert t[1].address == 'coolface'

The above methods patterns can be either a string or a compiled regular expression. TregexCompiled is a class for simply containing the compiled regex to be run on the above methods. If patterns are long, this usage will speed things up considerably.

from tregex import TregexCompiled

pattern = '(?P<name>[^;]+?)@(?P<address>.+?)\.(?P<domain>[^;]+)'
trc = TregexCompiled(pattern)

t = trc.to_object('john.smith@somewhere.co.uk; hackzor@coolface.com')

assert t[0].name == 'john.smith'

tregex also contains some methods for simply fuzzy text matching using difflib.SequenceMatcher:

from tregex import find_best

places_in_wales = ['Llanaber', 'Llanaelhaearn', 'Llanbedr', 'Llandbedrgoch', 'Llanbedrog', 'Llanberis', 'Llandanwg', 'Llanegryn', 'Llandegwning', 'Llandeiniolen', 'Llandwrog']

best = find_best('Llanberris', places_in_wales)
assert best == 'Llanberis'

The other methods are find, find_scores (returns the matched scores along with the candidate) and similarity (which returns the score between a single pair of strings).