Natural Language Processing (NLP) library for Urdu language.

urdu, machine, learning, text, pre-processing, tensorflow, nlp, backer, deep-learning, deeplearning, machine-learning, nlp-library, python, sponsors, urdu-hack, urdu-language, urdu-nlp, urdu-text-processsing, urduhack
pip install urduhack==1.1.1


Urduhack: NLP library for ( 🇵🇰 ) Urdu language

image image Azure DevOps builds Azure DevOps tests Build Status CodeFactor codecov image Downloads Join Slack License: MIT

Urduhack is a NLP library for urdu language. It comes with a lot of battery included features to help you process Urdu data in the easiest way possible.

Features Support

  • Normalization
    • Arabic and Urdu Unicode Redundancy Problem
    • Character Normalization
    • Combined Characters Normalization
    • Diacritics Removal
    • Spaces Before & After Digits
    • Spaces After Punctuations
    • Joined Words Fix
  • Tokenization
    • Sentence Tokenization
    • Words Tokenization
  • Data Pre-processing
    • Handles all kind of numbers, emails, currencies and urls etc.
  • Tasks
    • Sentimental Analysis
    • Sentence Classification
    • Documents Classification
    • Name Entity Recognition
    • Image to Text
    • Speech to Text
  • Datasets
    • IMDB Urdu Movies Review dataset


Urduhack officially supports Python 3.6–3.7, and runs great on PyPy.

$ pip install urduhack


Fantastic documentation is available at

How to Contribute

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.
  2. Write a test which shows that the bug was fixed or that the feature works as expected.
  3. Send a pull request and bug the maintainer until it gets merged and published. :)


Special thanks to everyone who contributed to getting the UrduHack to the current state.

Backers Backers on Open Collective

Thank you to all our backers! 🙏 [Become a backer]

Sponsors Sponsors on Open Collective

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

Copyright and license

Code released under the MIT License.