twitter-monitor

A Twitter streaming library built on tweepy that enables dynamic term tracking


Keywords
twitter, streaming, tweepy, filter
License
MIT
Install
pip install twitter-monitor==0.2.3

Documentation

Twitter Monitor

Build Status Coverage Status

A Twitter streaming library built on Tweepy that enables dynamic tracking of the filtered Twitter Streaming API.

This library provides a framework that you can use to build your own dynamic Twitter term tracking system. You will want to do three things:

  1. Create a subclass of TermChecker that knows how to look for tracked terms (e.g. in a database or a file). There is a FileTermChecker provided as an example.
  2. Create a subclass of JsonStreamListener that does something interesting with the tweets. Maybe write tweets to a file a database.
  3. Start an instance of the DynamicTwitterStream class, which ties it all together.

There is also a stream_tweets script you can use to get started streaming tweets more quickly. More information is below.

Installation

This package is available on PyPI here.

$ pip install twitter-monitor

Simple Streaming Script

This package includes a stream_tweets script that connects to Twitter using your API key, reads a list of filter terms from a file, and streams tweets to stdout.

To use stream_tweets, you will need to create a file containing your filter terms, one per line. The script will look for track.txt in the current directory, but you can override this. You also need to provide your Twitter API key info.

By default, an empty tracking file will result in no tweets being captured. If you want to instead capture unfiltered tweets using the sample API endpoint, you can use the "unfiltered" options (details below).

When you run stream_tweets, informational messages will be printed out to stderr, while tweets will be printed to stdout, one tweet per line, in JSON format. This makes it convenient to redirect the output into a file or another program:

$ stream_tweets > tweets.json

The required settings can be provided via environment variables, a .ini file, or command-line arguments. The command-line arguments take precedent:

$ stream_tweets --api-key XXXX --api-secret XXXX \
                --access-token XXXX --access-token-secret XXXX \
                --track-file my/track/file.txt \
                --poll-interval 15

The --poll-interval option defines how often to check the track file for updated terms. You can also use the option --unfiltered TRUE to enable capturing tweets without terms.

Alternatively, one or more of the options may be defined in a .ini file. The script will search in the current directory for twitter_monitor.ini, but this can be overridden using the --ini-file argument. Below is an example twitter_monitor.ini:

[twitter]
api_key=XXXX
api_secret=XXXX
access_token=XXXX
access_token_secret=XXXX
track_file=my/track/file.txt
poll_interval=15
unfiltered=TRUE

If options are not defined on the command line or in an ini file, environment variables are checked. Below are the names of the corresponding environment variables:

TWITTER_API_KEY=XXXX
TWITTER_API_SECRET=XXXX
TWITTER_ACCESS_TOKEN=XXXX
TWITTER_ACCESS_TOKEN_SECRET=XXXX
TWITTER_TRACK_FILE=my/track/file.txt
TWITTER_POLL_INTERVAL=15
TWITTER_UNFILTERED=TRUE

Custom Usage

Below is a simple example of how to set up and initialize a dynamic Twitter stream. This example uses the FileTermChecker and default JsonStreamListener implementations. There is a working example in the twitter_monitor/basic_stream.py file.

import tweepy
import twitter_monitor

# The file containing terms to track
terms_filename = "tracking_terms.txt"

# How often to check the file for new terms
poll_interval = 15

# Your twitter API credentials
api_key = 'YOUR API KEY'
api_secret = 'YOUR API SECRET'
access_token = 'YOUR ACCESS TOKEN'
access_token_secret = 'YOUR ACCESS TOKEN SECRET'

auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)

# Construct your own subclasses here instead
listener = twitter_monitor.listener.JsonStreamListener()
checker = twitter_monitor.checker.FileTermChecker(filename=terms_filename)

# Start and maintain the streaming connection...
stream = twitter_monitor.DynamicTwitterStream(auth, listener, checker)
while True:
    try:
        # Loop and keep reconnecting in case something goes wrong
        # Note: You may annoy Twitter if you reconnect too often under some conditions.
        stream.start_polling(poll_interval)
    except Exception as e:
        print e
        time.sleep(1)  # to avoid craziness with Twitter

Checking for Terms

To create a custom TermChecker, you need to override the update_tracking_terms(self) method. This method must return a set of terms. update_tracking_terms() will be called on your checker periodically to refresh the term list.

The twitter_monitor.checker.FileTermChecker class is included as an example.

If you are not using filter terms, construct your DynamicTwitterStream object with the unfiltered keyword argument set to True.

Handling Tweets

The Twitter streaming API emits various types of messages. The JsonStreamListener class includes stub methods for handling each of these. Please refer to the documentation for more information about what these messages mean.

Create a subclass of JsonStreamListener, overriding the handler methods for any message types you wish to respond to. Here is a simple Listener that just prints out tweets:

import twitter_monitor
import json

class PrintingListener(twitter_monitor.JsonStreamListener):

    def on_status(self, status):
        print json.dumps(status, indent=3)

    def on_limit(self, track):
        print "Horrors, we lost %d tweets!" % track

Note that the on_exception() handler is a bit different. It is called when there is some exception from within the tweepy streaming thread. By default the exception will be stored in the stream_exception field on your listener object.

More info about how listeners are used may be gleaned from the Tweepy source code.

Questions and Contributing

Feel free to post questions and problems on the issue tracker. Pull requests welcome!

Use python setup.py test to run tests.

Creating a release

  1. Increment the version number in setup.py. Commit and push.
  2. Create a new Release in GitHub with the appropriate version tag.
  3. Run setup.py sdist bdist to build the distribution for PyPi.
  4. Run twine upload -u USERNAME -p PASSWORD dist/* to upload to PyPi. You must have twine installed.