SocialVec is a general framework of Social Embeddings for eliciting social world knowledge from social networks, which was developed by Nir Lotan and Einat Minkov as part of their research, available here: https://arxiv.org/abs/2111.03514
New: SocialVec is now a library you can import and use!
pip install socialvec
Upon initialization, you can either create a new SocialVec instance with the default configuration, or select a specific version of the model. Currenly available version are:
- SocialVec2020.pkl.gz
- SocialVec2020_2022.pkl.gz If this is the first time you are using SocialVec and one of these models, the library will download the model binaries to your machine. In following usages, download will not be required, and the loading time will be significantly shorter.
from socialvec.socialvec import SocialVec
sv = SocialVec()
sv[12]
sv["12"]
sv["jack"]
sv.get_similar('jack')
twitter_id | similarity | screen_name | name | description | |
---|---|---|---|---|---|
0 | 6385432 | 0.841613 | dickc | dick costolo | nan |
1 | 989 | 0.831723 | om | OM | Partner emeritus @Trueventures I was a reporte... |
2 | 5746452 | 0.827466 | waltmossberg | Walt Mossberg | Board, News Literacy Project. Former columnist... |
3 | 20536157 | 0.826462 | #HeyGoogle | ||
4 | 6708952 | 0.819312 | SteveCase | Steve Case | Chairman of @Revolution. Chairman of @CaseFoun... |
5 | 9534522 | 0.816885 | Pogue | David Pogue | Host of “Unsung Science” podcast; "CBS Sunday ... |
6 | 5763262 | 0.813040 | karaswisher | Kara Swisher | Mother of (4) Dragons. Future resident of Hawa... |
7 | 14749070 | 0.808801 | Chad_Hurley | Chad Hurley | Co-Founder, @YouTube; Investor, @Warriors, @LA... |
8 | 22255654 | 0.805819 | johndoerr | John Doerr | Passionate about moving leaders to act—with sp... |
9 | 37570179 | 0.804565 | arrington | Michael Arrington 🏴☠️ | Founder of TechCrunch, CrunchBase and Arringto... |
When we want to get the embeddings of a user that is not a popular entity, we collect the list of accounts that this user follows, and provide it to the get_average_embeddings function. This function will return the embedding vector for this user.
** This function currently only supports getting a list of user IDs **
v = sv.get_average_embeddings([1, sv.get_userid('jack')], 989)
sv.get_similar(v[0])
twitter_id | similarity | screen_name | name | description | |
---|---|---|---|---|---|
0 | 12 | 1.000000 | jack | jack | #bitcoin |
1 | 6385432 | 0.841613 | dickc | dick costolo | nan |
2 | 989 | 0.831723 | om | OM | Partner emeritus @Trueventures I was a reporte... |
3 | 5746452 | 0.827466 | waltmossberg | Walt Mossberg | Board, News Literacy Project. Former columnist... |
4 | 20536157 | 0.826462 | #HeyGoogle | ||
5 | 6708952 | 0.819312 | SteveCase | Steve Case | Chairman of @Revolution. Chairman of @CaseFoun... |
6 | 9534522 | 0.816885 | Pogue | David Pogue | Host of “Unsung Science” podcast; "CBS Sunday ... |
7 | 5763262 | 0.813040 | karaswisher | Kara Swisher | Mother of (4) Dragons. Future resident of Hawa... |
8 | 14749070 | 0.808801 | Chad_Hurley | Chad Hurley | Co-Founder, @YouTube; Investor, @Warriors, @LA... |
9 | 22255654 | 0.805819 | johndoerr | John Doerr | Passionate about moving leaders to act—with sp... |
The function get similar can also get a list of twitter IDs, and will return the most similar list for the average of these users
edu = ['Harvard','MIT','UCLA']
edu_ids = [ sv.get_userid(id) for id in edu]
sports = ['FCBarcelona','ManUtd','realmadrid']
sports_ids = [ sv.get_userid(id) for id in sports]
sv.get_similar(edu_ids)
twitter_id | similarity | screen_name | name | description | |
---|---|---|---|---|---|
0 | 5695032 | 0.867065 | Yale | Yale University | News, events and updates from Yale University. |
1 | 5694822 | 0.861724 | Princeton | Princeton University | The official Twitter account of Princeton Univ... |
2 | 248795646 | 0.850461 | Columbia | Columbia University | The official Twitter feed of Columbia Universi... |
3 | 14884486 | 0.845595 | BrownUniversity | Brown University | Official Twitter feed for Brown University. 🐻 |
4 | 33474655 | 0.840983 | Cambridge_Uni | Cambridge University | Research, news and events from the University ... |
5 | 18036441 | 0.838993 | Stanford | Stanford University | Stanford is one of the world's leading researc... |
6 | 17369110 | 0.833544 | Cornell | Cornell University | Learning. Discovery. Engagement. Join the #Cor... |
7 | 19606528 | 0.804404 | HarvardHBS | Harvard Business School | Educating leaders who make a difference in the... |
8 | 48289662 | 0.795457 | UniofOxford | University of Oxford | Welcome to our official account 👋 Online 9am-5... |
9 | 21226678 | 0.793840 | dartmouth | Dartmouth | The official Twitter account of Dartmouth Coll... |
sv.get_similar(sports_ids)
twitter_id | similarity | screen_name | name | description | |
---|---|---|---|---|---|
0 | 740336334 | 0.931517 | GarethBale11 | Gareth Bale | Footballer. @LAFC and @FAWales. Instagram - ht... |
1 | 344801362 | 0.917337 | DavidLuiz_4 | David Luiz | Enjoy the life!\n🔴⚫️💥\nhttps://t.co/6cHcpZY4nc… |
2 | 140750163 | 0.915364 | juanmata8 | Juan Mata García | Professional football player. Member of @Commo... |
3 | 112764971 | 0.913976 | FCBarcelona_es | FC Barcelona | #ForçaBarça! ¡Síguenos!: @fcbarcelona_cat @fcb... |
4 | 533085085 | 0.912526 | M10 | Mesut Özil | Football player @ibfk2014 ⚽️ | Co-Founder @Uni... |
5 | 265982289 | 0.911782 | D_DeGea | David de Gea | ⚽ Goalkeeper @ManUtd 🇪🇸 International with @Se... |
6 | 1964571728 | 0.899911 | Benzema | Karim Benzema | Football player - @equipedefrance @realmadrid ... |
7 | 366592246 | 0.899444 | hazardeden10 | Eden Hazard | Belgium 🇧🇪 |
8 | 185827887 | 0.898743 | cesc4official | Cesc Fàbregas Soler | Proud dad of 5 beautiful children. 35 years ol... |
9 | 213745334 | 0.895597 | LuisSuarez9 | Luis Suárez | Club Nacional de Football player. Born in Salt... |
SocialVecClassifier is part of the socialvec package, so no additional installation is needed, however you need to initiate it seperately after creating the SocialVec object:
# create a SocialVec object as decribed above
from socialvec.socialvec import SocialVec
sv = SocialVec()
#init the classifier
sv.init_classifier()
Get political classification for a user, using its SocialVec vector:
# The classifier gets a SocialVec embedding vector as input, e.g.:
sv.classifier.predict_political( sv['JoeBiden'] )
#or:
sv.classifier.predict_political( sv['realDonaldTrump'] )
predict_political will return a Republican/Democrat classification, including confidence interval between 0 to 1, where 1 is high confidence, and 0 is no confidence (which may be expected for non-politically affiliated entities)