BERTSimilarity

A BERT embedding library for sentence semantic similarity measurement.


Keywords
Semantic, Similarity, BERT, Embeddings, Transformer, Cosine, Distance, Pytorch, bert-model, bert-tokenizer, cosine-distance, scipy, transformers-library
License
MIT
Install
pip install BERTSimilarity==0.1

Documentation

BERTSimilarity

A BERT Embedding library for sentence semantic similarity measurement 🤖

This library is a sentence semantic measurement tool based on BERT Embeddings. It uses the forward pass of the BERT (bert-base-uncased) model for estimating the embedding vectors and then applies the generic cosine formulation for distance measurement. The distance metric can be changed and the intermediate sentence and word embedding vectors can be attained as well. The model has been abstracted from the Google Research's BERT implementation.The pytorch wrapper over BERT is credited to Chris McCormick.

Dependencies

Pytorch

Transformers

Scipy

Usability

Installation is carried out using the pip command as follows:

pip install BERTSimilarity==0.1

For using inside the Jupyter Notebook or Python IDE:

import BERTSimilarity.BERTSimilarity as bertsimilarity

The 'Similarity_Test.py' file contains an example of using the Library in this context.

Samples

A sample of semantic similarity measurement with 4 different sentences , 2 of which are vaguely similar is provided below:

This Colab Notebook can be used as well for experimentation.

A Kaggle Kernel for Question Pair Similarity detection is also provided which uses this library.

The Notebook is featured in QuantumStat.com

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT