Project Demo Slides

Package: new release coming soon.

OnPoint: A Question Answering Service leveraging user reviews

OnPoint is a question answering service which levearages product user reviews. OnPoint saves you lots of time when you try to look for a product detail by providing you a short answer in seconds.

This repository explores the application of XL-Net on user review based question answering service. The base model and algorithm was inspired and based upon the XLNet: Generalized Autoregressive Pretraining for Language Understanding link and renatoviolin/xlnet link repo.

The directory structure of this repo is the following:

onpoint : contains all the source code
tst : contains all the unit tests
- data : contains data for unit test
configs : contains config files for hyperparameters during finetuning and evaluation

Setup

Installation

git clone https://github.com/hairong-wang/OnPoint.git
cd OnPoint

Requisites

tensorflow-gpu==1.15.0-rc1
absl-py==0.8.0
Flask==1.1.1
pip

Environment setup

export CUDA_VISIBLE_DEVICES=0

Steps to run

Step1: Configuration

All files in OnPoint/onpoint/bin need configuration. Here's one example to change path:

SQUAD_DATA_S3_BUCKET='squad-data'
SQUAD_DATA_TRAIN_S3_KEY='squad2.0/train-v2.0.json'
SQUAD_DATA_DEV_S3_KEY='squad 2.0/dev-v2.0.json'
LOCAL_SQUAD_DATA_TRAIN_PATH=./squad2.0_train.json
LOCAL_SQUAD_DATA_DEV_PATH=./squad2.0_dev.json

Step2: Prepare and Preprocess

- Download dataset

Download the dataset you want to use for finetuning. The datasets used in this project are:

The Squad dataset is used in this proeject.
The manual sampled and labeled AmazonQA and preprocessed dataset is available at Google Cloud Storage Buckets/xlnet_squad2/data/amazon, you can access the bucket from here.

- Download model checkpoints

The model checkpoints is available at Google Cloud Storage Buckets/xlnet_squad2/experiment/squad_and_amazon_8000steps_1000warmup, you can access the bucket from here. So far, the top performance model checkpoint is 'model.ckpt-4000'

- Convert dataset to SQuAD format(Optional)

If you want to try other dataset, it needs to be converted to SQuAD format first.

# Change the INFILE and OUTFILE path
python3 squad_converter.py

- Preprocess data

multi-processing available, need to change 'NUM_PROC=' to the number of core you'll use.

cd onpoint
bash bin/data_processing

Step3: Train model

bash bin/model_building

Step4: Evaluate model

bash bin/model_analysis

Step5: Inference model

bash bin/model_inference

Step6: run the flask app on your local machine

python3 app.py

Analysis

Final result:

Model	Finetune Dataset	Validation Dataset	AmazonQA Sample Coverage	F1
BERT-Large	SQuAD 2.0.	Augmented AmazonQA	30%	67.34
XLNet-Large	SQuAD 2.0.	Augmented AmazonQA	40%	66.20
XLNet-Large	Augmented AmazonQA	Augmented AmazonQA	0%	66.67
XLNet-Large	SQuAD 2.0 + Augmented AmazonQA	Augmented AmazonQA	50%	69.27

onpoint
Release 0.6

Release 0.6

0.6

Documentation

Project Demo Slides

Package: new release coming soon.

OnPoint: A Question Answering Service leveraging user reviews

The directory structure of this repo is the following:

Setup

Installation

Requisites

Environment setup

Steps to run

Step1: Configuration

Step2: Prepare and Preprocess

- Download dataset

- Download model checkpoints

- Convert dataset to SQuAD format(Optional)

- Preprocess data

Step3: Train model

Step4: Evaluate model

Step5: Inference model

Step6: run the flask app on your local machine

Analysis

Final result:

Stats

Development practices

Releases

Contributors

onpoint Release 0.6

Release 0.6 Toggle Dropdown 0.6

Documentation

Project Demo Slides

Package: new release coming soon.

OnPoint: A Question Answering Service leveraging user reviews

The directory structure of this repo is the following:

Setup

Installation

Requisites

Environment setup

Steps to run

Step1: Configuration

Step2: Prepare and Preprocess

- Download dataset

- Download model checkpoints

- Convert dataset to SQuAD format(Optional)

- Preprocess data

Step3: Train model

Step4: Evaluate model

Step5: Inference model

Step6: run the flask app on your local machine

Analysis

Final result:

Stats

Development practices

Releases

Contributors

onpoint
Release 0.6

Release 0.6

0.6