topic-cohesion

Cohesion measurement to evaluate partition


License
MIT
Install
pip install topic-cohesion==0.1.1

Documentation

Topic Cohesion

The Topic-Detection field deals mainly with providing names to given divisions of documents and lacks a quality measurement that provides a rating for the division, that represent a human-subjective score.

Given a division topic_cohesion will calculate the human-subjective score, and the related topic name to each label in a division.

The POC to this attitude can be found in the colab-notebook, or in the "Topic Cohesion Project- Full report"

The usage example can be also found in the colab-notebook-usage-example

Installation

pip install topic-cohesion

Usage Example

The input to the topic cohesion process must be a csv, txt, tsv file with a tab ['\t'] seperator and must have 'label' and 'text' columns. The 'text' is a list of strings which represents all the corpus senteces while the 'label' is a list of integers that represents the corpus divison. In the next example, senteces 1-3 are belong to group 1 and senteces 4 and 5 belongs to group 2.

import pandas as pd
from cohesion import topic_cohesion

data = {'text':
            ["we like to play football",
             "I'm playing football better than neymar and cristano ronaldo",
             "I like Fifa more than I like football, My Fav team is #RealMadrid Hala Madrid",
             "Hamburger or Pizza? what would i choose? I will eat both of them, it so tasty!",
             "banana pancakes with syrup maple, thats my favorite meal"],
        'label':
            [1, 1, 1, 2, 2]}
df = pd.DataFrame(data)
score, topic_names  = topic_cohesion.run_df(df)
print("Cohesion Final score is: ", score)
print("Cohesion Topics are: ", topic_names)

Expected output

Cohesion Final score is: 0.99
Cohesion Topics are: ['like football play ronaldo playing', 'tasty pizza hamburger eat choose']