orolo

An aggressive natural language dates parser for JavaScript


Keywords
dates, date, parser, natural language, understanding, nlp
License
MIT
Install
npm install orolo@0.1.1

Documentation

Orolo

Yet another natural language dates parser for JavaScript with a focus on understanding dates and date ranges in a sentence. Orolo is an aggressive parser designed to work on sentenses that are known to contain dates.

Build Status Coverage Status

Why

There are very good NLP date parses for JavaScript:

There are also strong general purpose NLP frameworks and some come with pre-build named entity recognition for dates:

Unfortunately, I couldn't find one that would properly understand these sentences:

  • Will be in New York on May 5th, 6th, and 7th
  • Will be in London from june third till june sixth
  • I was in SFO this past Tue and Wed

and that's how orolo was born. Take a look at the benchmark if you wonder how it compares to a few other libraries.

Install

npm install --save orolo

Orolo is written in Typescript and is compiled down to ES5 for distribution. I realized though that it's currently using Array.includes(), Array.find(), Object.assign() and thus requires ES6/7 runtime (node 6+) or a polyfill.

Usage

const { Orolo } = require('orolo');

const orolo = new Orolo();
const dates = orolo.recognize(
  'I will be in New York next week Tuesday and also Friday'
);

If you know that the dates in the sentence are not relative to today:

const context = new Date('2018-05-01');
const dates = orolo.recognize(
  'I am not available next week Mon - Wed',
  context
);

How It Works

Orolo works in two stages. First, it parses the sentence looking for date tokens. The process of looking traverses the list of word tokens with a simple back-off algorithm and defers to locale specific token detectors to recognize date tokens:

parsing process animated

Then it computes the date tokens by arranging them into a computable sequence. The DAY_OF_MONTH token, for example, can't be computed without being attached to either a MONTH token or a DIRECTION_MONTH token. The RANGE token will make orolo return all dates that are between the two boundaries - left and right hand side. In the example above, the left hand side of the range is complete and can be computed. The right hand side of the range, however, is missing a month or a relative month direction (i.e. next month, last month) and so orolo will "borrow" the DIRECTION_MONTH from the left hand side. The same logic applies, for example, when computing I was in New York on May 6th, 7th, and 8th to compute 7th and 8th as May 7th and May 8th respectively.

Locales

Right now orolo only supports English but it is designed to support other languages as well. You would need to implement the ILocale interface. The best way to do it at the moment is to take a look at how EnglishLocale is implemented. In short, you specify how to tokenize a sentence, you also specify how many language tokens (words) can a date token span, and then you implement token detectors for each token type that orolo supports.

Aggressive Parsing

Orolo takes everything that looks like a valid date token and then attempts to compute a date. The following two sentenses will compute down to May 5th:

It happened on May 5
I bought 5 shirts in May

There is no logic currently to look at either dependency graph or proximity of tokens to each other.

Parsed Sequence

If you would like to take advantage of parsing logic but don't want/need orolo to compute the dates:

const { parse, EnglishLocale } = require('orolo');
const sequence = parse(
  'I will be in New York next week Tuesday and also Friday',
  new EnglishLocale()
);

// sequence.tokens will return an Array of parsed tokens