Skip-Grammar
A framework for representing sequences as embeddings.
Models
Skip-gram Negative Sampling (SGNS)
Popular natural language processing models such as word2vec
and bert
can be repurposed to learn relationships from arbitrary sequences of items. Skip-gram Negative Sampling is such an algorithm part of the models
module. This is implemented in PyTorch components or can be composed as a PyTorch Lightning module. Both are availble under the relevent namespaces skipgrammar.models.sgns
and skipgrammar.models.lighting.sgns
.
Datasets
Last.FM
The Last.FM Dataset-1K dataset is comprised of the listening history of approximately 1,000 users from the music service Last.FM. The dataset is availble at the project's main site here and also preprocessed here for ease of use. The variants in the dataset
module use the latter.
MovieLens
The popular recommendation system dataset MovieLens is availble in three variants via the dataset
module.