okapibm25

A simple, easy to use implementation of the Okapi BM25 algorithm.


Keywords
okapi, bm25, ranking, relevance, information retrieval, information-retrieval, okapi-bm25
License
MIT
Install
npm install okapibm25@1.4.0

Documentation

@furkantoprak/bm25

Statements Branches Functions Lines

A strongly typed, well-tested implementation of the Okapi BM25 algorithm. Just provide your documents to search, query keywords, and (optionally) your weights (b and k1).

Installation

Check out the NPM package.

npm install okapibm25 --save

Usage

import { BM25 } from "okapibm25";

const documents = [
  "place",
  "documents",
  "here",
  "Each test document will be searched with the keywords specified below.",
];
const query = ["keywords", "of", "your", "query."];
// A numerical scoring will be returned.
const result = BM25(documents, query, { k1: 1.3, b: 0.9 }) as number[];
console.log(result);

Sorting

A recent update allows you to sort your documents. This works very similar to JavaScript's Array.prototype.sort() function.

Here is an example of how to sort in descending order (by score).

 const results = BM25(
      corpuses,
      ["relevant"],
      undefined,
      (firstEl, secondEl) => {
        return secondEl.score - firstEl.score;
      }
    ) as BMDocument[];

I've purposely given a schema that lets you sort results by more than just score; you could also sort alphabetically (or by how many times the word 'unicorn' is mentioned, for all I care!) by comparing the documents as well. You can also even ignore scores while sorting!

Important: Note that enabling sorting changes the return type from number[] to { document: string; score: number; }[]

What's this?

An implementation of OkapiBM25 (AKA BM25), a bag-of-words information retrieval algorithm. Read up on it here.

License

Under license.md

Contributing

Submit a Pull Request if you have a useful feature that you'd like to add. If you're too lazy or this isn't your area of expertise, open an issue and I'll get to it.