qs-kpa

Quantitative Summarization – Key Point Analysis


Keywords
contrastive-learning, emnlp, matching, paraphrase-identification, text-similarity
License
Apache-2.0
Install
pip install qs-kpa==0.0.1

Documentation

Keypoint Analysis

This library is based on the Transformers library by HuggingFace. Keypoint Analysis quickly embedds the statements with provided supported topic and the stances toward that topic.

What's New

July 1, 2021

Installation

Install with pip (stable version)

pip install keypoint-analysis

Install from sources (latest version)

git clone https://github.com/VietHoang1512/KPA
pip install -e .

Quick example

# Import needed libraries
from qs_kpa import KeyPointAnalysis

# Create a KeyPointAnalysis model
encoder = KeyPointAnalysis()

# Model configuration
print(encoder)

# Preparing data (a tuplet of (topic, statement, stance) or a list of tuple)
inputs = [
    (
        "Assisted suicide should be a criminal offence",
        "a cure or treatment may be discovered shortly after having ended someone's life unnecessarily.",
        1,
    ),
    (
        "Assisted suicide should be a criminal offence",
        "Assisted suicide should not be allowed because many times people can still get better",
        1,
    ),
    ("Assisted suicide should be a criminal offence", "Assisted suicide is akin to killing someone", 1),
]

# Go and embedd everything
output = encoder.encode(inputs, convert_to_numpy=True)

Detailed training

Given a pair of key point and argument (along with their supported topic & stance) and the matching score. Similar pairs with label 1 are pulled together, or pushed away otherwise.

Model

Model BERT/ConvBERT DistilBERT ALBERT XLNet RoBERTa ELECTRA BART
Siamese Baseline ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
Siamese Question Answering-like ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
Custom loss Baseline ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️

Loss

  • Constrastive
  • Online Constrastive
  • Triplet
  • Online Triplet (Hard negative/positive mining)

Distance

  • Euclidean
  • Cosine
  • Manhattan

Utils

  • K-folds
  • Full-flow

Pseudo-label

Group the arguments by their key point and consider the order of that key point within the topic as their labels (see pseudo_label). We can now utilize available pytorch metrics learning distance, losses, miners or reducers from this great open-source in the main training workflow. This is also our best approach (single-model) so far.

Model architecture

Training data

ArgKP dataset (Bar-Haim et al., ACL-2020)

Contributors

  • Phan Viet Hoang
  • Nguyen Duc Long

BibTeX

@misc{hoang2021qskpa,
  author = {Phan, V.H. & Nguyen, D.L.},
  title = {Keypoint Analysis},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/VietHoang1512/KPA}}
}