Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate project Natasha components: Razdel , Navec , Slovnet :
Tokenization
See Razdel evalualtion section for more info.
corpora
syntag
gicrya
rnc
errors
time
errors
time
errors
time
errors
time
re.findall(\w+|\d+|\p+)
4161
0.5
2660
0.5
2277
0.4
7606
0.4
spacy
4388
6.2
2103
5.8
1740
4.1
4057
3.9
nltk.word_tokenize
14245
3.4
60893
3.3
13496
2.7
41485
2.9
mystem
4514
5.0
3153
4.7
2497
3.7
2028
3.9
mosestokenizer
1886
2.1
1330
1.9
1796
1.6
2123
1.7
segtok.word_tokenize
2772
2.3
1288
2.3
1759
1.8
1229
1.8
aatimofeev/spacy_russian_tokenizer
2930
48.7
719
51.1
678
39.5
2681
52.2
koziev/rutokenizer
2627
1.1
1386
1.0
2893
0.8
9411
0.9
razdel.tokenize
1510
2.9
1483
2.8
322
2.0
2124
2.2
Sentence segmentation
corpora
syntag
gicrya
rnc
errors
time
errors
time
errors
time
errors
time
re.split([.?!…])
20456
0.9
6576
0.6
10084
0.7
23356
1.0
segtok.split_single
19008
17.8
4422
13.4
159738
1.1
164218
2.8
mosestokenizer
41666
8.9
22082
5.7
12663
6.4
50560
7.4
nltk.sent_tokenize
16420
10.1
4350
5.3
7074
5.6
32534
8.9
deeppavlov/rusenttokenize
10192
10.9
1210
7.9
8910
6.8
21410
7.0
razdel.sentenize
9274
6.1
824
3.9
11414
4.5
10594
7.5
Pretrained embeddings
See Navec evalualtion section for more info.
type
init, s
get, µs
disk, mb
ram, mb
vocab
ruscorpora_upos_cbow_300_20_2019
w2v
12.1
1.6
220.6
236.1
189K
ruwikiruscorpora_upos_skipgram_300_2_2019
w2v
15.7
1.7
290.0
309.4
248K
tayga_upos_skipgram_300_2_2019
w2v
15.7
1.2
290.7
310.9
249K
tayga_none_fasttextcbow_300_10_2019
fasttext
11.3
14.3
2741.9
2746.9
192K
araneum_none_fasttextcbow_300_5_2018
fasttext
7.8
15.4
2752.1
2754.7
195K
hudlit_12B_500K_300d_100q
navec
1.0
19.9
50.6
95.3
500K
news_1B_250K_300d_100q
navec
0.5
20.3
25.4
47.7
250K
type
simlex
hj
rt
ae
ae2
lrwc
ruscorpora_upos_cbow_300_20_2019
w2v
0.359
0.685
0.852
0.758
0.896
0.602
ruwikiruscorpora_upos_skipgram_300_2_2019
w2v
0.321
0.723
0.817
0.801
0.860
0.629
tayga_upos_skipgram_300_2_2019
w2v
0.429
0.749
0.871
0.771
0.899
0.639
tayga_none_fasttextcbow_300_10_2019
fasttext
0.369
0.639
0.793
0.682
0.813
0.536
araneum_none_fasttextcbow_300_5_2018
fasttext
0.349
0.671
0.801
0.706
0.793
0.579
hudlit_12B_500K_300d_100q
navec
0.310
0.707
0.842
0.931
0.923
0.604
news_1B_250K_300d_100q
navec
0.230
0.590
0.784
0.866
0.861
0.589
Morphology taggers
news
wiki
fiction
social
poetry
rupostagger
0.673
0.645
0.661
0.641
0.636
rnnmorph
0.896
0.812
0.890
0.860
0.838
maru
0.894
0.808
0.887
0.861
0.840
udpipe
0.918
0.811
0.957
0.870
0.776
spacy
0.919
0.812
0.938
0.836
0.729
deeppavlov
0.940
0.841
0.944
0.870
0.857
deeppavlov_bert
0.951
0.868
0.964
0.892
0.865
init, s
disk, mb
ram, mb
speed, it/s
rupostagger
4.8
3
118
48.0
rnnmorph
8.7
10
289
16.6
maru
15.8
44
370
36.4
udpipe
6.9
45
242
56.2
spacy
10.9
89
579
30.6
deeppavlov
4.0
32
10240
90.0 (gpu)
deeppavlov_bert
20.0
1393
8704
85.0 (gpu)
Syntax parser
news
wiki
fiction
social
poetry
uas
las
uas
las
uas
las
uas
las
uas
las
udpipe
0.873
0.823
0.622
0.531
0.910
0.876
0.700
0.624
0.625
0.534
spacy
0.876
0.818
0.770
0.665
0.880
0.833
0.757
0.666
0.657
0.544
deeppavlov_bert
0.962
0.910
0.882
0.786
0.963
0.929
0.844
0.761
0.784
0.691
init, s
disk, mb
ram, mb
speed, it/s
udpipe
6.9
45
242
56.2
spacy
10.9
89
579
31.6
deeppavlov_bert
34.0
1427
8704
75.0 (gpu)
NER
See Slovnet evalualtion section for more info.
factru
gareev
ne5
bsnlp
f1
PER
LOC
ORG
PER
ORG
PER
LOC
ORG
PER
LOC
ORG
deeppavlov
0.910
0.886
0.742
0.944
0.798
0.942
0.919
0.881
0.866
0.767
0.624
deeppavlov_bert
0.971
0.928
0.825
0.980
0.916
0.997
0.990
0.976
0.954
0.840
0.741
pullenti
0.905
0.814
0.686
0.939
0.639
0.952
0.862
0.683
0.900
0.769
0.566
texterra
0.900
0.800
0.597
0.888
0.561
0.901
0.777
0.594
0.858
0.783
0.548
tomita
0.929
0.921
0.945
0.881
natasha
0.867
0.753
0.297
0.873
0.347
0.852
0.709
0.394
0.836
0.755
0.350
mitie
0.888
0.861
0.532
0.849
0.452
0.753
0.642
0.432
0.736
0.801
0.524
init, s
disk, mb
ram, mb
speed, articles/s
deeppavlov
5.9
1024
3072
24.3 (gpu)
deeppavlov_bert
34.5
2048
6144
13.1 (gpu)
pullenti
2.9
16
253
6.0
texterra
47.6
193
3379
4.0
tomita
2.0
64
63
29.8
natasha
2.0
1
160
8.8
mitie
28.3
327
261
32.8