Vietnamese morphological analyzer with using SVMs.

This morphological analyzer use SVMs for wordsegmentation and part-of-speech tagging.

Requirements

Python 2
YamCha
model file, download to ./models/vnPOS.model

Usage

% git clone https://github.com/kanjirz50/viet-morphological-analysis-svm.git

Please download model file from here to ./models/vnPOS.model

# running analyzer
% python viet_morph_analyze.py < cat input_text.txt

How to make model file

Get tagged Corpus

vnPOS

Convert format from vnPOS to IOB2 tag format

Corpus is given below format.

Tấp_nập//JJ sắm//VB đtdđ//NN đầu//NN năm//NC
...

Change format to IOB2 tag format.(Use only I tag and B tag.)

% cat vnPOS.txt | python ./utils/vnPOS_to_iob2.py > vnPOS.iob2
# Output likes below one.
Tấp       B-JJ
nập       I_JJ
sắm       B-VB
đtdđ  B-NN
đầu      B-NN
năm        B-NC

...

Training with YamCha

# Show YamCha libexec directory
% yamcha-config --libexecdir
/usr/local/Cellar/yamcha/0.33/libexec/yamcha

# Copy Makefile
% cp /usr/local/Cellar/yamcha/0.33/libexec/yamcha/Makefile .

# Training
% make CORPUS=vnPOS.txt.rnd.train.iob2 MODEL=./model/vnPOS FEATURE="F:-2..2:0..0 T:-2..-1" train

visvmtagger
Release 1.0a0

Release 1.0a0

1.0a0

Documentation

Vietnamese morphological analyzer with using SVMs.

Requirements

Usage

How to make model file

Get tagged Corpus

Convert format from vnPOS to IOB2 tag format

Stats

Development practices

Releases

Contributors

visvmtagger Release 1.0a0

Release 1.0a0 Toggle Dropdown 1.0a0

Documentation

Vietnamese morphological analyzer with using SVMs.

Requirements

Usage

How to make model file

Get tagged Corpus

Convert format from vnPOS to IOB2 tag format

Stats

Development practices

Releases

Contributors

visvmtagger
Release 1.0a0

Release 1.0a0

1.0a0