dnr

RNN based word2vec implementation for news recommendation


Keywords
recommendation, recommendersystem, rnn, lstm, word2vec, gensim
License
Other
Install
pip install dnr==0.9.3

Documentation

This is an approach to text recommendation based on word embeddings obtained from gensim model that can be build using gensim

https://radimrehurek.com/gensim/

After obtaining word2vec embeddings, This library will return you best recommendation text out of collections of text such that text that was read by user is already provided

This project is a part of my research for text based recommendation

W

=   [
    [w1, w2, ....... ],
    [w1, w2, ..............],
    [w1,w2,  ...................],
    ....
    ....
    ]

above document can be converted into vector from using word2vec model

V

=   word2vec(W)
=   [
    [v1, v2, ....... ],
    [v1, v2, ..............],
    [v1, v2, ...................],
    ....
    ....
    ]

Calculating tf-idf score for each words in document

T

=   tfidf(V)

=   [
    [t1, t2, ....... ],
    [t1, t2, ..............],
    [t1, t2, ...................],
    ....
    ....
    ]

Getting vector using Hadamard product of V and T

VT

=   [
    [v1 o t1, v2 o t2, ....... ],
    [v1 o t1, v2 o t2, ..............],
    [v1 o t1, v2 o t2, ...................],        
    ....
    ....
    ]

Now calculating the mean each sentence in VT

VT_

  =   [
      mean([v1 o t1, v2 o t2, ....... ]),
      mean([v1 o t1, v2 o t2, ..............]),
      mean([v1 o t1, v2 o t2, ...................]),        
      ....
      ....
      ]

Now getting mean of VT_ which represents the mean of user digested news

V_

  =   mean(VT_)

Now Suppose S is the collection of news from where recommendation is to be made we can convert it into vector represents using word2vec using following

S

  =   [
      [v1, v2, ....... ],
      [v1, v2, ..............],
      [v1, v2, ...................],
      ....
      ....
      ]

Now we can measure similarity

sim(S,V_)

        =   [
            [score],
            [score],
            [score],
            ....
            ....
            ]

Now depending upon the top score news can be recommended

This is just an implementation of my research for building content-based recommendation engine using word2vec model obtain from RNN