birdspotter

A package to measure the influence and botness of twitter users, from twitter dumps


Keywords
twitter, influence, botness, birdspotter, dumps, bot-detection, influence-score
License
MIT
Install
pip install birdspotter==0.1.1a1

Documentation

logo

birdspotter: A tool to measure social attributes of Twitter users

PyPI status PyPI version fury.io Documentation Status

birdspotter is a python package providing a toolkit to measures the social influence and botness of twitter users. It takes a twitter dump input in json or jsonl format and produces measures for:

  • Social Influence: The relative amount that one user can cause another user to adopt a behaviour, such as retweeting.
  • Botness: The amount that a user appears automated.

References:

Rizoiu, M.A., Graham, T., Zhang, R., Zhang, Y., Ackland, R. and Xie, L. # DebateNight: The Role and Influence of Socialbots on Twitter During the 1st 2016 US Presidential Debate. In Twelfth International AAAI Conference on Web and Social Media (ICWSM'18), 2018. https://arxiv.org/abs/1802.09808
Ram, R., & Rizoiu, M.-A. A social science-grounded approach for quantifying online social influence. In Australian Social Network Analysis Conference (ASNAC'19) (p. 2). Adelaide, Australia, 2019.

Installation

pip3 install birdspotter

birdspotter requires a python version >=3.

Usage

To use birdspotter on your own twitter dump, replace './example.jsonl' with the path to your twitter dump './path/to/tweet/dump.json'. In this example we use a bespoke dataset found in this repository. It can be downloaded here.
from birdspotter import BirdSpotter
bs = BirdSpotter('./example.jsonl')
# This may take a few minutes, go grab a coffee...
labeledUsers = bs.getLabeledUsers(out='./output.csv')

After extracting the tweets, getLabeledDataFrame() returns a pandas dataframe with the influence and botness labels of users and writes a csv file if a path is specified i.e. ./output.csv.

birdspotter relies on the Fasttext word embeddings wiki-news-300d-1M.vec, which will automatically be downloaded if not available in the current directory (./) or a relative data folder (./data/).

Get Cascades Data

After extracting the tweets, the retweet cascades are accessible by using:

cascades = bs.getCascadesDataFrame()
This dataframe includes the expected structure of the retweet cascade as given by Rizoiu et al. (2018) via the column expected_parent in this dataframe.

Advanced Usage

Adding more influence metrics

birdspotter provides DebateNight influence as a standard, when getLabeledUsers is run. To generate spatial-decay influence run:

bs.getInfluenceScores(time_decay = -0.000068, alpha = 0.15, beta = 1.0)

This returns the updated featureDataframe with influence scores appended, under the column influence (<alpha>,<time_decay>,<beta>).

Training with your own botness data

birdspotter provides functionality for training the botness detector with your own training data. To generate an csv to be annotated run:

bs.getBotAnnotationTemplate('./annotation_file.csv')

Once annotated the botness detector can be trained with:

bs.trainClassifierModel('./annotation_file.csv')

Defining your own word embeddings

birdspotter provides functionality for defining your own word embeddings. For example:

customEmbedding # A mapping such as a dict() representing word embeddings
bs = BirdSpotter('./example.jsonl', embeddings=customEmbedding)

Embeddings can be set through several methods, refer to setWord2VecEmbeddings.

Note the default bot training data uses the wiki-news-300d-1M.vec and as such we would need to retrain the bot detector for alternative word embeddings.

Alternatives to python

Command-line usage

birdspotter can be accessed through the command-line to return a csv, with the recipe below:

birdspotter ./path/to/twitter/dump.json ./path/to/output/directory/

R usage

birdspotter functionality can be accessed in R via the reticulate package. reticulate still requires a python installation on your system and birdspotter to be installed. The following produces the same results as the standard usage.

install.packages("reticulate")
library(reticulate)
use_python(Sys.which("python3"))
birdspotter <- import("birdspotter")
bs <- birdspotter$BirdSpotter("./example.jsonl")
bs$getLabeledDataFrame(out = './output.csv')

Acknowledgements

The development of this package was partially supported through a UTS Data Science Institute seed grant.