💡 Functionalities
hindiwsd will mainly carry out the following tasks for now:
- Hinglish to Hindi transliteration
- Spell correction of Hindi text
- POS tagging of Hindi text
- Word Sense Disambiguation of Hindi text with the help of IndoWordNet
- Enhanced Lesk's Algorithm using custom dataset
💾 Installation
Install hindiwsd via 'pip'
pip install hindiwsd
🗒️ NOTE
-
A small change will need to be made to iwn.py from the pyiwn library before using our package
- There is a missing try-catch block in iwn.py that might cause our package to crash
- Here's a quick fix, use our patched iwn.py instead. Copy it's contents and replace it with the original iwn.py.
- The path to the original iwn.py would be path-to-your-env-or-python-folder/lib/site-packages/pyiwn/iwn.py
📄 CUSTOM DATASET FOR ENHANCED LESK'S ALGORITHM
The custom dataset is available here.
⚡ Getting Started
🔤 Word Sense Disambiguation
- The wordsense() function from the hindi_wsd.py script. It prints out the Hindi Devanagari spell corrected sentence, POS tags and disambiguated word meanings for each word in the sentence available on the IndoWordNet.
>>> from hindiwsd import hindi_wsd
>>> print(hindi_wsd.wordsense("aaj achha din hai"))
- You can also directly feed in Hindi sentences to the wordsense() function.
>>> from hindiwsd import hindi_wsd
>>> print(hindi_wsd.wordsense("आज अच्छा दिन है"))
🏷️ POS tagging for Hindi Devanagari
- Getting POS tags for a Hindi sentence using the POS_tagger() function from the wsd.py script. Returns a list of tuples containing word and respective tag(NOUN, ADJECTIVE, ADVERB, VERB).
>>> from hindiwsd import wsd
>>> print(wsd.POS_tagger('आज अच्छा दिन है'))
📚 Hinglish to Hindi transliteration with spell correction
- Transliterating the Hinglish code mixed sentence to Hindi Devanagari using the preprocess_transliterate() function from the wsd.py script. Returns two strings. The first is the spell corrected Hinglish sentence followed by the spell corrected Hindi sentence.
>>> from hindiwsd import wsd
>>> print(wsd.preprocess_transliterate('aaj achha din hai'))