SCluster

an implementation of spectral clustering for documents

Homepage: http://github.com/whym/scluster

Contact: http://whym.org

Homepage:	http://github.com/whym/scluster
Contact:	http://whym.org

Spectral clustering a modern clustering technique considered to be effective for image clustering among others. [1] [2]

This software find clusters among documents based on the bag-of-words representation [3] and TF-IDF weighting [4].

[1]	Ulrike von Luxburg, A Tutorial on Spectral Clustering, 2006. http://arxiv.org/abs/0711.0189

[2]	Chris H. Q. Ding, Spectral Clustering, 2004. http://ranger.uta.edu/~chqding/Spectral/

[3]	http://en.wikipedia.org/wiki/Bag_of_words_model

[4]	http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Following softwares are required.

Clone this repository.
Prepare documents as raw-text files, and put them in a directory, for example, 'reuters'.
Prepare a category file. For example, 'cats.txt' may contain:
```
14833 palm-oil veg-oil
14839 ship
```
This means that the file '14833' has 'palm-oil' and 'veg-oil' as its categories, and '14839' has 'ship' as its category.
Run: python -m scluster.clusterer cats.txt reusters/ -m kmeans,

When you use the Reuters set, notice No 17980 might contain non-Unicode character at Line 10. It should probably read: "world economic growth-side measures ..."

[5]	http://www.daviddlewis.com/resources/testcollections/reuters21578/