wp2corpus

Wikipedia dump parser


License
Other
Install
pip install wp2corpus==0.0.2

Documentation

wp2corpus

wp2cps (wikipedia to corpus) provides some functions for making a text corpus from wikipedia.

Installation

pip install wp2corpus

Wikimedia

Wikimedia parses a text which written in Wikimedia style and returns a parsed text.

example

from wp2corpus import Wikimedia

wm = Wikimedia()

input_string = 'I am [himkt|himkt].'
print(wm.parse(input_string))
input_string = '私は[まつのき|himkt]です'
print(wm.parse(input_string))

here are outputs

I am himkt.
私はhimktです