DoubleArrayTrie(DAT) support prefix search & exact search & multiple pattern match for python implemented by c++


Keywords
Double, Array, Trie, DAT
License
MIT
Install
pip install pydat==0.7

Documentation

DoubleArrayTrie

DoubleArrayTrie(DAT) support prefix search & exact search & multiple pattern match for python implemented by c++

What's different

Now greedy search is supported:
function search only returns the longest subsequence(greedy search) function maximum_search returns all subsequences

word: she sher shers he
to_search: shers

search: 
{'entities': [{'begin': 0, 'end': 5, 'entity': 'shers'}]}

maximum_search: 
{'entities': [{'begin': 0, 'end': 3, 'entity': 'she'}, {'begin': 0, 'end': 4, 'entity': 'sher'}, {'begin': 0, 'end': 5, 'entity': 'shers'}, {'begin': 1, 'end': 3, 'entity': 'he'}]}

C++

#include "double_array_trie.h"

DoubleArrayTrie dat;
// load from file
dat.load_file("test.txt")
// or add word
dat.add_word("he")
dat.add_word("she")
// make base & check
dat.make()
// greedy_search
vector<string> index_s = dat.search("ushers")
// maximum_search
vector<string> index_s = dat.maximum_search("ushers")

Python

# install 
pip install pydat

# use
dat = pydat.Dat()
dat.load_file("test.txt")
dat.add_word("he")
dat.make()
# search
dat.search("ushers")
# greedy_search
dat.maximum_search("ushers")

About DAT

Definition
check[base[s] + c] = s
base[s] + c = t

Reference: An implementation of Double-Array Trie

Update Log

2018.09.19

  • optimize the algorithm

2018.08.29

  • support greedy_search

2018.08.08

  • support python3