github.com/kerinin/hammer

Hamming search


Install
go get github.com/kerinin/hammer

Documentation

Hammer

Build Status

Hamming distance search index

Use

All requests to the API should be POST's. Three endpoints are exposed; /add, /query and /delete, which do what they say. Databases are identified by the size of the values they index, their tolerance (the maximum hamming distance of returned values from a query) and an arbitrary namespace value. Supported value sizes:

  • b/64/:tolerance/:namespace: Substitution-variant DB indexing 64-bit binary values
  • b/128/:tolerance/:namespace: Substitution-variant DB indexing 128-bit binary values
  • b/256/:tolerance/:namespace: Substitution-variant DB indexing 256-bit binary values
  • bN/:bits/:tolerance/:namespace: Substitution-variant DB indexing binary values of length :bits (multiples of 8)
  • v/64/:length/:tolerance/:namespace: Deletion-variant DB indexing vectors of 64-bit values of length :length
  • v/128/:length/:tolerance/:namespace: Deletion-variant DB indexing vectors of 128-bit values of length :length
  • v/256/:length/:tolerance/:namespace: Deletion-variant DB indexing vectors of 256-bit values of length :length

End points accept arrays of additions, queries and deletions and return results arrays - each element in the result array relates to the corresponding element in the request array.

# Start an HTTP server on port 3000
cargo build && target/build/hammer --bind localhost:3000

# Add some keys
curl -X POST -d '["AAAAAAAAAAA=","AAAAAAAAAAA=","AAAAAAAA","AADZvdpG3MA="]' localhost:3000/add/b/64/8/foo
# ["ok",exists","err: unable to decode 'AAAAAAAA': (...)","ok"]
curl -X POST -d '[["AAAAAAAAAAA=","AAAAAAAAAAE="],["AAAAAAAAAAI=","AADZvdpG3MA="]]' localhost:3000/add/v/64/2/8/foo
# ["ok","ok"]

# Query for some keys
curl -X POST -d '["AAAAAAAAAAA=","AADZvdpG3MA="]' localhost:3000/query/b/64/8/foo
# [["AAAAAAAAAAI=","AAAAAAAAAAE=","AAAAAAAAAAA="],["AADZvdpG3MA="]]

# Delete keys
curl -X POST -d '["AAAAAAAAAAA="]' localhost:3000/delete/b/64/8/foo
# ["ok"]

Architecture

Keys are partitioned into a set of indices. Indices consist of a mapping from a key partition to the key value, as well as mappings from each key's 1-permutation to the key value.

Keys are added by splitting into partitions, each partition is merged into its associated index.

Keys are retrieved by splitting the query key into partitions and querying for the partition in each index. Results are filtered by the target hamming distance and returned as a set.

This is mostly an implementation of HmSearch