Afrikaans |
af |
OpenSubtitles |
top 1M vectors all vectors model binary
|
324K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
17M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
17M |
|
Arabic |
ar |
OpenSubtitles |
top 1M vectors all vectors model binary
|
188M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
120M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
308M |
|
Bulgarian |
bg |
OpenSubtitles |
top 1M vectors all vectors model binary
|
247M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
53M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
300M |
|
Bengali |
bn |
OpenSubtitles |
top 1M vectors all vectors model binary
|
2M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
19M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
21M |
|
Breton |
br |
OpenSubtitles |
top 1M vectors all vectors model binary
|
111K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
8M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
8M |
|
Bosnian |
bs |
OpenSubtitles |
top 1M vectors all vectors model binary
|
92M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
13M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
105M |
|
Catalan |
ca |
OpenSubtitles |
top 1M vectors all vectors model binary
|
3M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
176M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
179M |
|
Czech |
cs |
OpenSubtitles |
top 1M vectors all vectors model binary
|
249M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
100M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
349M |
|
Danish |
da |
OpenSubtitles |
top 1M vectors all vectors model binary
|
87M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
56M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
143M |
|
German |
de |
OpenSubtitles |
top 1M vectors all vectors model binary
|
139M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
976M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
1B |
|
Greek |
el |
OpenSubtitles |
top 1M vectors all vectors model binary
|
271M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
58M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
329M |
|
English |
en |
OpenSubtitles |
top 1M vectors all vectors model binary
|
751M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
2B |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
3B |
|
Esperanto |
eo |
OpenSubtitles |
top 1M vectors all vectors model binary
|
382K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
38M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
38M |
|
Spanish |
es |
OpenSubtitles |
top 1M vectors all vectors model binary
|
514M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
586M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
1B |
|
Estonian |
et |
OpenSubtitles |
top 1M vectors all vectors model binary
|
60M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
29M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
90M |
|
Basque |
eu |
OpenSubtitles |
top 1M vectors all vectors model binary
|
3M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
20M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
24M |
|
Farsi |
fa |
OpenSubtitles |
top 1M vectors all vectors model binary
|
45M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
87M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
132M |
|
Finnish |
fi |
OpenSubtitles |
top 1M vectors all vectors model binary
|
117M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
74M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
191M |
|
French |
fr |
OpenSubtitles |
top 1M vectors all vectors model binary
|
336M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
724M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
1B |
|
Galician |
gl |
OpenSubtitles |
top 1M vectors all vectors model binary
|
2M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
40M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
42M |
|
Hebrew |
he |
OpenSubtitles |
top 1M vectors all vectors model binary
|
170M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
133M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
303M |
|
Hindi |
hi |
OpenSubtitles |
top 1M vectors all vectors model binary
|
660K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
31M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
32M |
|
Croatian |
hr |
OpenSubtitles |
top 1M vectors all vectors model binary
|
242M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
43M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
285M |
|
Hungarian |
hu |
OpenSubtitles |
top 1M vectors all vectors model binary
|
228M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
121M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
349M |
|
Armenian |
hy |
OpenSubtitles |
top 1M vectors all vectors model binary
|
24K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
38M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
39M |
|
Indonesian |
id |
OpenSubtitles |
top 1M vectors all vectors model binary
|
65M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
69M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
134M |
|
Icelandic |
is |
OpenSubtitles |
top 1M vectors all vectors model binary
|
7M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
7M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
15M |
|
Italian |
it |
OpenSubtitles |
top 1M vectors all vectors model binary
|
278M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
476M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
754M |
|
Georgian |
ka |
OpenSubtitles |
top 1M vectors all vectors model binary
|
1M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
15M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
16M |
|
Kazakh |
kk |
OpenSubtitles |
top 1M vectors all vectors model binary
|
13K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
18M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
18M |
|
Korean |
ko |
OpenSubtitles |
top 1M vectors all vectors model binary
|
7M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
63M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
70M |
|
Lithuanian |
lt |
OpenSubtitles |
top 1M vectors all vectors model binary
|
6M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
23M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
29M |
|
Latvian |
lv |
OpenSubtitles |
top 1M vectors all vectors model binary
|
2M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
14M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
16M |
|
Macedonian |
mk |
OpenSubtitles |
top 1M vectors all vectors model binary
|
20M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
27M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
47M |
|
Malayalam |
ml |
OpenSubtitles |
top 1M vectors all vectors model binary
|
2M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
10M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
12M |
|
Malay |
ms |
OpenSubtitles |
top 1M vectors all vectors model binary
|
12M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
29M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
41M |
|
Dutch |
nl |
OpenSubtitles |
top 1M vectors all vectors model binary
|
265M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
249M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
514M |
|
Norwegian |
no |
OpenSubtitles |
top 1M vectors all vectors model binary
|
46M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
91M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
136M |
|
Polish |
pl |
OpenSubtitles |
top 1M vectors all vectors model binary
|
250M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
232M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
483M |
|
Portuguese |
pt |
OpenSubtitles |
top 1M vectors all vectors model binary
|
258M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
238M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
496M |
|
Romanian |
ro |
OpenSubtitles |
top 1M vectors all vectors model binary
|
435M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
65M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
500M |
|
Russian |
ru |
OpenSubtitles |
top 1M vectors all vectors model binary
|
152M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
391M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
543M |
|
Sinhala |
si |
OpenSubtitles |
top 1M vectors all vectors model binary
|
3M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
6M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
9M |
|
Slovak |
sk |
OpenSubtitles |
top 1M vectors all vectors model binary
|
47M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
29M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
76M |
|
Slovenian |
sl |
OpenSubtitles |
top 1M vectors all vectors model binary
|
107M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
32M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
138M |
|
Albanian |
sq |
OpenSubtitles |
top 1M vectors all vectors model binary
|
12M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
18M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
30M |
|
Serbian |
sr |
OpenSubtitles |
top 1M vectors all vectors model binary
|
344M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
70M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
413M |
|
Swedish |
sv |
OpenSubtitles |
top 1M vectors all vectors model binary
|
101M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
143M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
245M |
|
Tamil |
ta |
OpenSubtitles |
top 1M vectors all vectors model binary
|
123K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
17M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
17M |
|
Telugu |
te |
OpenSubtitles |
top 1M vectors all vectors model binary
|
103K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
15M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
15M |
|
Tagalog |
tl |
OpenSubtitles |
top 1M vectors all vectors model binary
|
88K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
7M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
7M |
|
Turkish |
tr |
OpenSubtitles |
top 1M vectors all vectors model binary
|
240M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
55M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
295M |
|
Ukrainian |
uk |
OpenSubtitles |
top 1M vectors all vectors model binary
|
5M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
163M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
168M |
|
Urdu |
ur |
OpenSubtitles |
top 1M vectors all vectors model binary
|
196K |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
16M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
16M |
|
Vietnamese |
vi |
OpenSubtitles |
top 1M vectors all vectors model binary
|
27M |
word counts bigram counts trigram counts
|
|
|
Wikipedia |
top 1M vectors all vectors model binary
|
115M |
word counts bigram counts trigram counts
|
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary
|
143M |
|