Topic Modeling Tool for Persian Short Texts
The tool for topic modeling provided by the Data Science Innovation Center extracts topics from digitized Persian texts and compares their performance in short texts using a variety of topic modeling techniques.
Visit the website to view the description in Persian.
Installation
We recommend Python 3.6 or higher, gensim 4.2 or higher.
Install from sources
You can also clone the latest version from the repository and install it directly from the source code:
git clone https://github.com/DSInCenter/topicmodel.git
cd topicmodel
pip install -r requirements.txt
Getting Started
To get started, you can see the demo of GSDMM's algorithm in this link:
These examples demonstrate how to clone and execute a model on Google Colab:
LDA demonstration:
First, import Dataset Class from Dataset.py and import LDA model from LDA.py:
from tools.Dataset import Dataset
from LDA import LDA
Create Objects from Dataset and LDA Classes and Traing The Model:
lda = LDA(num_topics=11, iterations=5)
dataset = Dataset('Dataset', 'utf-8')
lda_result = lda.train_model(dataset, hyperparams=None, top_words=10)
print(lda_result)
Citing & Authors
If you find this repository helpful, feel free to cite this work :
@inproceedings{
}
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.