deepneighbor

embedding-based item nearest neighborhoods extraction


Keywords
embedding, information, retrieval, deep, learning, torch, tensor, pytorch, nearest, neighbor
License
MIT
Install
pip install deepneighbor==0.3.1

Documentation

DeepNeighbor


Logo

Embedding-based Retrieval for ANN Search and Recommendations!
View Demo · Report Bug · Request Feature

Python Versions PyPI Version license GitHub repo size Open Source? Yes!

Downloads GitHub Issues Maintenance Ask Me Anything ! made-with-python

Install

pip install deepneighbor

How To Use

from deepneighbor import Embed

model = Embed(data_path, model='gat')
model.train() # see optional parameters below
model.search(seed = 'Louis', k=10) # ANN search
embedings = model.get_embeddings() # dictionary. key: node; value: n-dim node embedding

Input format

The input data for the Embed() should be a (*.csv or *.txt ) file path (e.g. '\data\data.csv')with two columns in order: 'user' and 'item'. For each user, the item are recommended to be ordered by time.

Models & parameters in Embed()

  • Word2Vec w2v
  • Graph attention network gat
  • Factorization Machines fm
  • Deep Semantic Similarity Model
  • Siamese Network with triple loss
  • Deepwalk
  • Graph convolutional network
  • Neural Graph Collaborative Filtering algorithm ngcf
  • Matrix factorization mf

Model Parameters

word2vec

model = Embed(data, model = 'w2v')
model.train(window_size=5,
            workers=1,
            iter=1
            dimensions=128)
  • window_size Skip-gram window size.
  • workersUse these many worker threads to train the model (=faster training with multicore machines).
  • iter Number of iterations (epochs) over the corpus.
  • dimensions Dimensions for the node embeddings

graph attention network

model = Embed(data, model = 'gat')
model.train(window_size=5,
            learning_rate=0.01,
            epochs = 10,
            dimensions = 128,
            num_of_walks=80,
            beta=0.5,
            gamma=0.5,)
  • window_size Skip-gram window size.
  • learning_rate learning rate for optimizing graph attention network
  • epochs Number of gradient descent iterations.
  • dimensions Dimensions for the embeddings for each node (user/item)
  • num_of_walksNumber of random walks.
  • beta and gammaRegularization parameter.

How To Search

model.search(seed, k)

  • seed The Driver for the algorithms
  • k Number of Nearest Neighbors.

Examples

Open Colab to run the example with facebook data.

License

This project is under MIT License, please see here for details.