reveal-user-classification

Performs user classification into labels using a set of seed Twitter users with known labels andthe structure of the interaction network between them.


Keywords
online-social-network, user-classification, Reveal-FP7
License
Apache-2.0
Install
pip install reveal-user-classification==0.2.8

Documentation

reveal-user-classification

Performs user classification into labels using a set of seed Twitter users with known labels and the structure of the interaction network between them.

Features

Install

Required packages

Installation

To install for all users on Unix/Linux:

python3.4 setup.py build
sudo python3.4 setup.py install

Reveal-FP7 Integration

The name of the entry point script is user_network_profile_classifier.

user_network_profile_classifier -uri $MONGO_DB_URI -id $MONGO_ASSESSMENT_ID
-tak $TWITTER_APP_KEY -tas $TWITTER_APP_SECRET
-rmquri $AMQP_URI -rmqq $AMQP_QUEUE_NAME -rmqe $AMQP_EXCHANGE -rmqrk $AMQP_ROUTING_KEY
-ln $LATEST_N -lts $LOWER_TIMESTAMP -uts $UPPER_TIMESTAMP
-nt $NUMBER_OF_PARALLEL_TASKS -nua $NUMBER_OF_USERS_TO_ANNOTATE
-unpcdb $USER_NETWORK_PROFILE_CLASSIFIER_MONGO_DB

The following two arguments are for establishing a connection to a Mongo database and accessing the documents in a collection.

  • $MONGO_DB_URI example: "mongodb://admin:123456@127.0.0.1:27017"

  • $MONGO_ASSESSMENT_ID example: "new_tweets_database_name.new_tweets_collection_name", separated by a "." as shown.

The following two arguments are for using a Twitter app in order to fetch data from Twitter.

  • $TWITTER_APP_KEY and $TWITTER_APP_SECRET: Both are taken from one's created app in the Twitter development site.

The following four arguments are for publishing messages to a RabbitMQ queue. The queue is used both for publishing a "SUCCESS" message at completion, but also for publishing the results of the module.

  • $AMQP_URI example: amqp://guest:guest@localhost:5672//
  • One must also supply: $AMQP_QUEUE_NAME, $AMQP_EXCHANGE and $AMQP_ROUTING_KEY

There are some optional arguments that can be considered. The following three can be used either together or apart; otherwise all of the tweets in the collection will be read.

  • $LATEST_N: The N latest chronologically documents will be read from the defined collection. In order for this to work properly, the "created_at" field of the tweets must be in the proper time format as defined by MongoDB.
  • $LOWER_TIMESTAMP: A UNIX timestamp; based on the created_at tweet field. Only tweets after this timestamp will be used for the analysis.
  • $UPPER_TIMESTAMP: Similarly, for an upper limit.

The following four arguments set various parameters for the execution of the module.

  • $NUMBER_OF_PARALLEL_TASKS: Number of parallel tasks initiated for each assessment analysis launch. If not specified, tries to set as number of cores.
  • $NUMBER_OF_USERS_TO_ANNOTATE: Number of users to annotate automatically, using Twitter data. Each user requires approximately at least an additional minute. Default value is 90. For faster testing, try a smaller number.

Some intermediate data and the resulting user-to-topic association will be written in a Mongo database on the same Mongo client used for the input.

  • $USER_NETWORK_PROFILE_CLASSIFIER_MONGO_DB: A distinctive name should be chosen so as not to interfere with the databases reserved for input data. The collection in which the results are written is: "user_topics_collection".

The entry point script can be viewed on /reveal_user_classification/entry_points/user_network_profile_classifier.py where the argument usage can be read in greater detail.