If you ever tried to parse message quotations or signatures you know that absence of any formatting standards in this area could make this task a nightmare. Hopefully this library will make your life much easier.
pip install claw
Here’s how you initialize the library and extract a reply from a text message:
import claw from claw import quotations claw.init() text = """Reply -----Original Message----- Quote""" reply = quotations.extract_from(text, 'text/plain') reply = quotations.extract_from_plain(text) # reply == "Reply"
To extract a reply from html:
html = """Reply <blockquote> <div> On 11-Apr-2011, at 6:54 PM, Bob <email@example.com> wrote: </div> <div> Quote </div> </blockquote>""" reply = quotations.extract_from(html, 'text/html') reply = quotations.extract_from_html(html) # reply == "<html><body><p>Reply</p></body></html>"
Often the best way is the easiest one. Here’s how you can extract signature from email message without any machine learning fancy stuff:
from claw.signature import extract_signature message = """Wow. Awesome! -- Bob Smith""" text, signature = extract_signature(message) # text == "Wow. Awesome!" # signature == "--\nBob Smith"
Quick and works like a charm 90% of the time. For other 10% you can use the power of machine learning algorithms. See the original talon implementation.
virtualenv venv source venv/bin/activate make install make test
Release new version:
Bump the version in setup.py and update CHANGELOG.md, and then: