tsm

Twitter Subgraph Manipulator (TSM)


Keywords
twitter, network, analysis, data, science, big
License
Other
Install
pip install tsm==8.0.3

Documentation

TSM

Twitter Subgraph Manipulator by Deen Freelon

In short, TSM is a Python module that contains a few functions for analyzing Twitter and Twitter-like (i.e., directed and very sparse with community structure) network data. I wrote it for my own research purposes but thought someone out there might find it useful.

Installation instructions

Now available in PyPI! Just use: pip3 install tsm.

Alternatively, you can simply install TSM's dependencies manually and move tsm.py into your PYTHONPATH directory.

Features

Here are some of the things TSM can do:

  • Support very large communities (millions of nodes/edges)--the only limit is your computer's memory
  • Extract retweets and @-mentions into edgelist format for network analysis and visualization
  • Partition networks into communities, isolate the N largest communities, and identify the most-connected users in each community
  • Measure the insularity of network communities (using EI indices) to determine the extent to which each looks like an echo chamber
  • Measure the overlap between network communities to determine which ones interact more and less often
  • Get the top retweets in a Twitter dataset and rank them by N of retweets and by community
  • Track Twitter (or other) communities over time: compute similarity scores (weighted or unweighted Jaccard coefficients) for partitioned network communities drawn from the same dataset at two different time slices
  • Discover which nodes intermediate between which communities
  • Find the most-used hashtags in each community (or dataset)
  • Find the most-used hyperlinks or web domains in each community (or dataset)

See tsm.py for a full description of TSM's functions and how to use them. The module should work as long as NetworkX and python-louvain are installed.

Requirements

Here's what you need to use TSM:

Documentation

The TSM demo files.zip file contains two IPython notebooks and a Twitter ID file that can be used to demo many of TSM's functions. Code and instructions are provided to hydrate the Twitter ID file. For testing purposes, here is a very brief (fabricated) sample demonstrating how input data for the t2e function should be formatted in a plain text file. This sample was created by Devin Gaffney (@DGaffney):

dgaff,"Twitter is pretty fun, isn't it, @dfreelon?"
dfreelon,Yes indeed @dgaff - @some_other_user weigh in?
some_other_user,Of course twitter is grand. Mostly because of @dril.
dril,Some weird tweet no one understands but everyone favorites @some_other_user
cnnbrk,Looks like @dril just tweeted

Acknowledgments

I gratefully acknowledge funding support from the US Institute of Peace in creating this module.