tech-parser

Parses articles from 34 sites and outputs it into HTML. Also, it's some sort of RSS reader.

You can see it in action here. And here's template repository for deploying at heroku.

Current list of sites
One awesome feature
Installation
- Requirements
- How to install
How to use
Configuring

One awesome feature

New in 1.4.0
Before You scroll away, I want You to know about one awesome feature that TechParser has.
I'm talking about ranking.

Every time when You click on like button below article TechParser adds it to the database.
And next time when it will parse articles it will sort them according to those articles in that database.

Installation

Requirements

Mako
Bottle
Grab
Daemo

All these modules can be installed with pip or easy_install.

How to install

TechParser works on both Python 2.X and 3.X, although I still recommend to use Python 3.X.

You can install TechParser by running
pip install TechParser
or
python setup.py install

How to use

Run python -m TechParser start to start server
And then open localhost:8080 in your browser.
python -m TechParser stop to stop server
python -m TechParser update to manually update list of articles.
python -m TechParser run HOST:PORT run server without starting daemon.
python -m TechParser lock to not allow updating articles.
python -m TechParser unlock to allow updating articles (run this if you can't update articles).
python -m TechParser locked? to check if updating articles is allowed.
python -m TechParser train to train classifier (useful after chaning ngrams).
python -m TechParser rerank to rank articles again.
python -m TechParser -h show help.
python -m TechParser <action> --config <path to configuration file> set path to configuration file.

Run python -m TechParser --help for more info.

To make usage easier I recommend to make an alias like this:
alias tech-parser="python -m TechParser" on *nix based OS or
doskey tech-parser=python -m TechParser $* on Windows
After that You will be able to run tech-parser instead of python -m TechParser.

Configuring

Don't forget to check out TechParser/parser_config.py after updating.

Changing configuration in browser

New in 1.8.3
By default you have json_config=True in ~/.tech-parser/user_parser_config.py. That allows you to edit configuration right in your browser (click at Edit config link). Note that when you save your configuration in browser, you update ~/.tech-parser/user_parser_config.json, not ~/.tech-parser/user_parser_config.py. In order to disable that just set json_config=False in ~/.tech-parser/user_parser_config.py and restart parser.

Enabling/disabling parsers

To enable/disable site parsers edit ~/.tech-parser/user_parser_config.py.
If you can't find the file, run python -m TechParser then search again.
Find there line with sites_to_parse and comment those sites, which you don't want to see articles from.

For example if you don't want to see articles from Habrahabr (it's in russian only), find this fragment of code:

		"Habrahabr": { # habrahabr.ru
			"module": habrahabr,
			"kwargs": {},
			"enabled": True
		},

and make it look like this:

		"Habrahabr": { # habrahabr.ru
			"module": habrahabr,
			"kwargs": {},
			"enabled": False
		},

All you need to do is to set enabled to False.

Setting password

New in 1.8.2

You can set password inside your configuration like this:

password = 'your password'

password = os.environ.get('TechParser_PASSWORD', '')

In last case you need to set environment variable TechParser_PASSWORD equal to your password.
After that when you'll open TechParser in your browser it will ask you to enter password.
Session expires after a year.

Adding RSS feeds

New in 1.7.0
Find the following line in your configuration:

rss_feeds = {}

RSS feed should contain it's name, url, short name (without spaces and stuff like that), url to icon and title color. Example feeds:

rss_feeds = {'CSS-tricks': {
		'short-name': 'css-tricks',
		'url': 'http://feeds.feedburner.com/CssTricks?format=xml',
		'icon': 'http://css-tricks.com/favicon.ico',
		'color': '#DA8817'
	},
	
	'The Next Web':	{
		'url': 'http://feeds2.feedburner.com/thenextweb',
		'short-name': 'nextweb',
		'icon': 'http://thenextweb.com/favicon.ico',
		'color': '#F15A2F'
	}
}

Asynchronous parsing

New in 1.7.0
You can set number of threads available for parsing.
To do that you need to set num_threads in your configuration.
Example:

num_threads = 4

Word lists

New in 1.7.5
Articles can also be sorted by words you find interesting and boring. To do that you can set variables interesting_words and boring_words. Example:

interestring_words = {'word1', 'word2', 'word3'}
boring_words = {'word4', 'word5', 'word6'}

You can also set priority for each word:

interesting_words = [['python', 5.0], ['fortran', 3.0], 'css', 'html', ['google', 1.5]]
boring_words = [['pascal', 10.0], 'delphi']

Default priority for each word is 1

Update interval

Find the line of code in user_parser_config.py like this:

update_interval = 1800

and set update_interval equal to any amount of seconds you want.

For example if update_interval will be set to 3600, it will update data every hour.
Note that this hour is not hour after server start.
It means, that every time, when epoch time is divisible by 3600 TechParser will update articles. With this interval TechParser will update articles at:
00:00
01:00
02:00
...
13:00
14:00
...and so on.

Custom host and port

In ~/.tech-parser/user_parser_config.py find two variables: host and port and set them equal to whatever host and port you want.
Example:

host="0.0.0.0"
port="8081"

TechParser
Release 1.9.0

Release 1.9.0

1.9.0

1.8.4

1.8.3

1.8.2

1.8.1

1.8.0

1.7.14

1.7.13

1.7.12

1.7.11

Documentation

tech-parser

Table of contents

Current list of sites

One awesome feature

Installation

Requirements

How to install

How to use

Configuring

Changing configuration in browser

Enabling/disabling parsers

Setting password

Adding RSS feeds

Asynchronous parsing

Word lists

Update interval

Custom host and port

Stats

Development practices

Releases

Contributors

TechParser Release 1.9.0

Release 1.9.0 Toggle Dropdown 1.9.0 1.8.4 1.8.3 1.8.2 1.8.1 1.8.0 1.7.14 1.7.13 1.7.12 1.7.11

Documentation

tech-parser

Table of contents

Current list of sites

One awesome feature

Installation

Requirements

How to install

How to use

Configuring

Changing configuration in browser

Enabling/disabling parsers

Setting password

Adding RSS feeds

Asynchronous parsing

Word lists

Update interval

Custom host and port

Stats

Development practices

Releases

Contributors

TechParser
Release 1.9.0

Release 1.9.0

1.9.0

1.8.4

1.8.3

1.8.2

1.8.1

1.8.0

1.7.14

1.7.13

1.7.12

1.7.11