sofia-tree

A fast in-memory dictionary for autocomplete as you type features based on a trie data structure.


Keywords
auto-complete, autocomplete, complete, trie, radix, typeahead, sofia, patricia, tree, type, memory, dictionary
License
MIT
Install
npm install sofia-tree@0.0.9

Documentation

Sofia Tree

NPM Build Status Coverage Status

A fast-recovery data structure to support auto-complete as you type features.

A kind of Radix Tree or Trie data structure baptized 'Sofia Tree' in honour to my beloved daughter. A multinode tree with a level for each letter with fast lookup operations. See Anottated Source

Features

  • Non-recursive tree traverse implementation to keep memory usage low and speed up on lookup operations.
  • Efficient in-memory data structure, no redis, no dependencies
  • Optional built-in cache for even faster lookups.
  • Optionally case sensitive
  • Optionally limit number of results to get matches even faster
  • Optionally exclude the given prefix if it's a valid word
  • Batteries included example to benchmark its performance

Install

    npm install sofia-tree

Quick and simple usage example

var SofiaTree=require('sofia-tree');

var dictionary=["f","foo","foobar","b","bar","barbar"];

var sofiaTree= new SofiaTree({
    useCache: true,
    caseSensitive: false
});

dictionary.forEach(function(word){
    sofiaTree.insert(word);
},server);

//Retrieve all the completions including the prefix
console.log(sofiaTree.getCompletions("foo"));

["foo","foobar"]

//Retrieve all the completions without the prefix
console.log(sofiaTree.getCompletions("foo",true));

["foobar"]

//Limit the number of results without excluding the prefix (defaults to false)
console.log(sofiaTree.getCompletions("f",2)))
["f","foobar"]

//Prints the whole dictionary   
console.log(sofiaTree.getCompletions(""));  
["f","foo","foobar","b","bar","barbar"]

See tests for more complete use cases

A real world example

Clone the whole repo and run npm run example to load a micro search engine, in another console run cd example && ./test.sh to load a 350,000 words dictionary and launch 1,500 searchs with 100 threads in parallel. See Anottated Source and shell scripts

Known limitations

Actually when limiting the maximum number or matches to be retorned it won't necessarely return the first ones in alphabetical order. When returning all the results they won't be alphabetically sorted either. This is due the nature of javascript objects properties which are unordered by definition. Perhaps I'll switch properties for an array in the future.

To be done

  • Define nice jasmine tests and setup Travis
  • Comment the example API usage
  • Reduce prefixes stack memory comsuption
  • Partial cache invalidation mechanism when inserting new words (it actually resets the whole cache upon insertion of new words.)
  • Implement remove method
  • Return results as stream
  • Case-sensitive lookups? Not sure if it's a valuable adition
  • Add weight on nodes to enable sort based on some business logic
  • Any ideas? :)

Made with ❤ in Spain