mcqa

Answering multiple choice questions with Language Models.


Keywords
artificial-intelligence, bert, deep-learning, machine-learning, natural-language-processing, nlp, pytorch, question-answering
License
Apache-2.0
Install
pip install mcqa==0.1.1

Documentation

mcQA : Multiple Choice Questions Answering

Answering multiple choice questions with Language Models.

CircleCI PyPI Version GitHub codecov PRs Welcome

Installation

With pip

pip install mcqa

From source

git clone https://github.com/mcqa-suite/mcqa.git
cd mcQA
pip install -e .

Getting started

Data preparation

To train a mcQA model, you need to create a csv file with n+2 columns, n being the number of choices for each question. The first column should be the context sentence, the n following columns should be the choices for that question and the last column is the selected answer.

Below is an example of a 3 choice question (taken from the CoS-E dataset) :

Context sentence Choice 1 Choice 2 Choice 3 Label
People do what during their time off from work? take trips brow shorter become hysterical take trips

If you have a trained mcQA model and want to infer on a dataset, it should have the same format as the train data, but the label column.

See example data preparation below:

from mcqa.data import MCQAData

mcqa_data = MCQAData(bert_model="bert-base-uncased", 
                     lower_case=True, 
                     max_seq_length=256) 
                     
train_dataset = mcqa_data.read(data_file='swagaf/data/train.csv', is_training=True)
test_dataset = mcqa_data.read(data_file='swagaf/data/test.csv', is_training=False)

Model training

from mcqa.models import Model

mdl = Model(bert_model="bert-base-uncased",
            device="cuda") 
            
mdl.fit(train_dataset, 
        train_batch_size=32, 
        num_train_epochs=20)

Prediction

preds = mdl.predict(test_dataset, 
                    eval_batch_size=32)

Evaluation

from sklearn.metrics import accuracy_score
from mcqa.data import get_labels

print(accuracy_score(preds, get_labels(train_dataset)))

References

Type Title Author Year
📰 Paper Explain Yourself! Leveraging Language Models for Commonsense Reasoning Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong and Richard Socher ACL 2019
📰 Paper SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference Rowan Zellers, Yonatan Bisk, Roy Schwartz and Yejin Choi 2018