GotchaTwitter

A python twitter crawler.

Support on crawling timeline for a target user [in a certain date range].
Developing Support on crawling threads.
Warning Using TwitterAPI to get user information with uid/screen_name is much faster and safer than web-scraping method.

Dependencies

bs4 Beautifulsoup
lxml Html parser for beautifulsoup (has special installation method on Amazon EC2)
tqdm Progress bar in terminal
requestsplus Self-modified requests package with max retries and sleeping time between requests
pushbullet.py (optional) Notifier when crawling is finished.

Example

input = [<screen_name_1>, <screen_name_2>]
access_token = <pushbullet_token>
output_fp = <filepath you want to save>
with GotchaTwitter('timeline', input, output_fp) as gt:
    gt = gt.set_output(save_mode='w', has_header=True) \
        .set_notifier('pushbullet', access_token=access_token)
    gt.crawl()

Notifier Setting

PushBullet

Register a PushBullet account and create an access token in your account setting.
Download and install Pushbullet app on your device (iOS tested).

Install lxml on Amazon Linux AMI (2016.03.3)

sudo yum install libxml2-devel libxslt-devel python-devel gcc
sudo pip install --upgrade setuptools
sudo /usr/local/bin/easy_install lxml

gotchatwitter
Release 0.1.13

Release 0.1.13

0.1.26

0.1.25

0.1.24

0.1.23

0.1.22

0.1.21

0.1.20

0.1.19

0.1.18

0.1.17

Documentation

GotchaTwitter

Dependencies

Example

Notifier Setting

PushBullet

Install lxml on Amazon Linux AMI (2016.03.3)

Stats

Development practices

Releases

Contributors

gotchatwitter Release 0.1.13

Release 0.1.13 Toggle Dropdown 0.1.26 0.1.25 0.1.24 0.1.23 0.1.22 0.1.21 0.1.20 0.1.19 0.1.18 0.1.17

Documentation

GotchaTwitter

Dependencies

Example

Notifier Setting

PushBullet

Install lxml on Amazon Linux AMI (2016.03.3)

Stats

Development practices

Releases

Contributors

gotchatwitter
Release 0.1.13

Release 0.1.13

0.1.26

0.1.25

0.1.24

0.1.23

0.1.22

0.1.21

0.1.20

0.1.19

0.1.18

0.1.17